Omnimaga

Calculator Community => TI Calculators => Calculator C => Topic started by: Matrefeytontias on June 26, 2014, 06:25:50 am

Title: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 26, 2014, 06:25:50 am
Hi guys,

I've been working on nKaruga since quite some time now, and although it's reasonably fast on monochrome TI-Nspires, it's accurately twice as slow on color calcs, although it's the same code running and that the monochrome screen is being configured in 16-bits mode by the game.

I have no idea why it's like this, I keep trying things and nothing changes. I even removed every drawing command except the screen update and the ship, and the speed stays the same. Since I'm really, really stuck, I thought of asking you guys.

The full source is here : https://github.com/matrefeytontias/nKaruga

It's also possible that it's n2DLib's fault, since it's used to interact with the screen : https://github.com/n2DLib/n2DLib

Feel free to fork it and submit PR, I'll be very happy to see things fixed because I don't know what to do.

Also, I have no color calc to test, only monochrome. Several people made tests for me.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 26, 2014, 10:31:23 am
Does nspire_emu show the same symptoms?
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 26, 2014, 10:40:46 am
Nope, nspire_emu tells me the game is equally fast on both screen types.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 08:09:52 am
Quote
I've been working on nKaruga since quite some time now, and although it's reasonably fast on monochrome TI-Nspires, it's accurately twice as slow on color calcs, although it's the same code running and that the monochrome screen is being configured in 16-bits mode by the game.
So you don't handle those different LCDs differently? Using the monochrome LCD in 16bpp mode like a color one looks awful.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on June 27, 2014, 08:40:34 am
I've run a few tests and I'm fairly sure it's n2DLibs sprite drawing function. I already did test IkarugaX for you, and the time-to-scroll-sprite-over-screen time is almost the same, namely 06:55 seconds. When I take away the fullscreen pic on the background, it's suddenly three times as fast: 01:91 seconds. It was mentioned on IRC that the first versions of n2DLib (nFastGraphX) supposedly were faster (jetpack impossible 2). This has been proven wrong by porting the source to the newest version of n2DLib, see attached files. Hayleia told me that pictures are drawn to screen by copying them pixel-by-pixel to the screen. I don't know exactly what the nspire can do and what not, but I think you'll have to thoroughly re-think the way you are drawing sprites.
I really hope you can fix the lib though, I love how easy to use it is.


EDIT: Of course, this does not explain why it is almost twice as fast on the b/w series nspire, but because both methods for drawing (both with and without buffer) are about the same speed (should I say slow? that'd be a lame joke), maybe it's the display driver? O.O But no, that's ridiculous, gpsp can run F-Zero at much higher speeds <_<
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Hayleia on June 27, 2014, 09:02:07 am
Now that I think of it, which version of Jetpack Impossible did you use to do your tests ? Because there was one ridiculously slow version (using double buffering) that was attached to a post by Matref in that topic, but the version I said could possibly use fast routines was the one you can find in the TI Planet archives that this post (http://www.omnimaga.org/ti-nspire-projects/%28ndless%29-jetpack-impossible/) links to.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on June 27, 2014, 09:16:19 am
Oh. I used the slow version. I wish I never found the fast version D: j/k.

So disregard what I said above about "proven wrong", the version Hayleia linked here is quite a bit faster. Holy Cow.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 27, 2014, 10:36:48 am
Code: [Select]
                    │16:22:32       &aeTIos | void set_display_buffer(void* buffer)                                                                                                         │
                    │16:22:32       &aeTIos | { *(volatile void**)0xC0000010 = buffer;                                                                                                      │
                    │16:22:32       &aeTIos | }                                                                                                                                             │
                    │16:22:32       &aeTIos | void update_at_vblank()                                                                                                                       │
                    │16:22:32       &aeTIos | { while((*(volatile unsigned*)0xC0000020 & 4) == 0) { } *(volatile unsigned*)0xC0000028 = 4; set_display_buffer(nspire_displayed_screen);     │
                    │16:22:41       &aeTIos | }           
He dug that in the gpSP source, I think it might be useful. Basically buffer swapping.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Hayleia on June 27, 2014, 11:22:06 am
Well matref and pierrot already tried to that but for some reason it was slow too -.-
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 12:45:09 pm
So you don't handle those different LCDs differently? Using the monochrome LCD in 16bpp mode like a color one looks awful.
<_< yeah well you could avoid these useless comments. I do that because it's faster since you're not required to test if the screens are different. And the problem right now is that it's 3 times slower on the color calcs than on the monochrome calcs, so using the monochrome LCD in 16 bpp mode is apparently a good idea.

I've run a few tests and I'm fairly sure it's n2DLibs sprite drawing function.
I already sent you a binary where no sprites were drawn, and you told me the speed was the same.

I already did test IkarugaX for you, and the time-to-scroll-sprite-over-screen time is almost the same, namely 06:55 seconds. When I take away the fullscreen pic on the background, it's suddenly three times as fast: 01:91 seconds.
*nKaruga

Well that's kinda strange because that's supposed to be super-fast, as it's barely slower than just clearing the screen - except if accessing arrays is really that slow.

Well matref and pierrot already tried to that but for some reason it was slow too -.-
Mh, I never tested bit 2 of the LCD driver. I'll try that.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 02:58:38 pm
Could you do something entirely without n2dlib, just directly memcpy one buffer into the other 500 times and compare that? Both LCDs in 16bpp mode of course.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 03:45:17 pm
So you mean copying a buffer 500 times to the screen, and comparing that with copying 500 times a buffer in another buffer ?
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 03:46:15 pm
No, run the program on CX and not-CX.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 04:15:07 pm
Also, I have no color calc to test, only monochrome.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 27, 2014, 04:17:03 pm
Also you have plenty of color calc owner to test it for you. Just give the thingy to us. :P
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 04:27:08 pm
Alright so that's what to test. It's simply memcpy-ing a buffer to the screen 1000 times.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 27, 2014, 05:08:45 pm
This executed in about 6.5 seconds for me.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: pimathbrainiac on June 27, 2014, 05:12:28 pm
2.5 seconds on the GS CAS calc.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 27, 2014, 05:13:22 pm
Holy cow that's quite the difference. O.O So it's got to do with writing to the LCD being slow I guess.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 05:15:04 pm
But I never saw anything like that happen before ... There must be something Nspire-related I forgot to setup ... Could TCT_Local_Interrupts(int) be the cause of this ?
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 27, 2014, 05:16:05 pm
Protip : always disable OS crap before doing anything. :P
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 05:39:53 pm
I had to disable all IRQs and FIQs to get rid of the clock on screen, but TCT_Local_Control_Interrupts(0) should do the same.
But normally ndless itself does that already.

And matref, are you copying directly to SCREEN_BASE_ADDRESS? On GS calcs it's in SRAM and on CX in sdram because of it's size.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 05:53:35 pm
I'm indeed copying to *(void**)0xC0000010. So would allocating a buffer in RAM and using it even on color calcs fix the speed issue ? (hopeless try)
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 05:55:28 pm
You shouldn't be able to write to SRAM if you're in 16bpp mode and filling the screen.
SRAM is 0xA4000000 - 0xA4020000 = 128 KiB and 320*240*16b= 150KiB
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 05:58:06 pm
On monochrome calcs, I malloc two 320*240*2 buffers instead of one, one being an actual buffer and one acting as the screen. On color calcs I only allocate one buffer and use the default screen buffer.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 06:02:15 pm
I'm doing the same thing in nGL, but actually a third buffer, so the screen buffer is still usable after inversion.
So is the GS's SDRAM indeed that much slower?
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 06:03:55 pm
Well, I don't know but I'll try that, and if that's it I'll call it another thing that we should know.

/me would be so happy if that was the actual problem

EDIT : wait a minute you misunderstood. The program is 2.5 times faster on GS calcs than on color calcs, not the other way around.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on June 27, 2014, 06:07:15 pm
I decided to un-overclock my calc and re-run copyscreen1000times.12:53 seconds.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: DJ Omnimaga on June 27, 2014, 06:12:02 pm
But I never saw anything like that happen before ... There must be something Nspire-related I forgot to setup ... Could TCT_Local_Interrupts(int) be the cause of this ?

Could it be because of the same logic as with how slow it is to draw TI-84 Plus C Silver Edition pixels compared to TI-84 Plus Silver Edition ones due to the pixel data being much larger? I know the Nspire calcs are much faster and that the color ones can be overclocked higher, but still.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 06:13:00 pm
As I said, I set up both GS and color screen so they use the same number of bits for a pixel, for instance 16.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: DJ Omnimaga on June 27, 2014, 06:17:10 pm
Yeah but wouldn't it still be faster to paste the data on the grayscale screen since it only supports 4 bits, such as converting the data beforehand or stripping the extra data? Not sure how the grayscale Nspire screens work, though, so maybe I'm just misunderstanding something.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 27, 2014, 06:17:36 pm
I have a theory:
The frame buffer contents mustn't be cached, as that may cause artifacts. So caching is disabled by the OS.
On GS calcs the screen buffer is in SRAM, so SRAM caching is disabled. But it's active on SDRAM writes and reads.
On CX calcs the screen buffer is somewhere in SDRAM, and the buffer set by the OS has caching disabled. So writes and reads will be slower.
What happens if you allocate two buffers for both calcs so the default one is never used?
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 06:20:15 pm
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).

Vogtinator : I was wondering that. Will test when I can (probably in some minutes).
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Adriweb on June 27, 2014, 06:22:45 pm
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).
In case you didn't see :
I decided to un-overclock my calc and re-run copyscreen1000times.12:53 seconds.

(Or well, I'm not sure where you see this '6x' faster :P)

GL&HF anyway ^^
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 06:24:27 pm
12 ≈ 6 * 2 if I still know how to count. aeTIos's tests are made on a color calc and give 12 seconds, pimath's tests are done on a GS calc and gives 2 seconds.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on June 27, 2014, 06:26:42 pm
I don't know if pimaths calc is OCed though. Even then, the max speed doesn't go much over 150MHz. So yeah something really strainge is going on.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Adriweb on June 27, 2014, 06:32:10 pm
Well, it's actually closer so 5 times, but OK, as long as tests are consistent it's good.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: pimathbrainiac on June 27, 2014, 06:36:13 pm
I have not OC'd. My calc came with a non-downgrade-able OS, so I updated to 3.6, then ndless'd
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on June 27, 2014, 08:03:08 pm
So I came up with this, which actually seems to give 167 FPS (I guess this is the maximum the color Nspire's LCD can give) at any frequency on a color calc ; giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously (all are aeTIos's tests).

Only problem, it crashes on exit (that is, after 6 seconds), and I have no idea why.

Code: [Select]
unsigned short *BUFF_BASE_ADDRESS;
void *SCREEN_BACKUP;

void initBuffering()
{
   void *temp;
   temp = malloc(BUFF_BYTES_SIZE);
   if(!temp)
      exit(0);
   BUFF_BASE_ADDRESS = (unsigned short*)malloc(BUFF_BYTES_SIZE);
   if(!BUFF_BASE_ADDRESS)
   {
      free(temp);
      exit(0);
   }
   
   SCREEN_BACKUP = *(void**)0xC0000010;
   
   // Handle monochrome screens-specific shit
   if(is_classic)
      *(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x08;
   
   *(void**)0xC0000010 = temp;
}

void updateScreen()
{
   // Screen-access delays make this the fastest method apparently
   memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
}

void deinitBuffering()
{
   void *temp = *(void**)0xC0000010;
   // Handle monochrome screens-specific shit again
   if(is_classic)
      *(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x04;
   *(void**)0xC0000010 = SCREEN_BACKUP;
   free(temp);
   free(BUFF_BASE_ADDRESS);
}

int main(void)
{
   int i;
   
   initBuffering();
   clearBufferB();
   
   for(i = 0; i < 1000; i++)
   {
      memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
   }
   
   deinitBuffering();
   return 0;
}

Binaries attached, but it does crash your calc.

EDIT : did some tests with the current version of nKaruga (source accessible from github). The game runs at 95 FPS on my 120 MHz grayscale TI-Nspire CAS with Ndless 3.1 r914, calculated by seeing that you take 3.4 seconds to cross 320 pixels by moving one pixel by one pixel.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Streetwalrus on June 28, 2014, 03:42:01 am
I did one last experiment today. I was getting the same results as aeTIos. I tried with CPU at 246MHz and AHB at 35MMHz, the thing crashes after 12 seconds, which makes it even clearer that it's memory that causes the bottleneck.
aeTIos tried different scaling modes in gpSP and unscaled is much faster, not only because of the scaling but also for the same reason.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on June 28, 2014, 04:05:59 am
I have not yet tested downclocking my AHB in gpSP-nspire, I'm doing that right now.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Vogtinator on June 30, 2014, 05:12:21 pm
I just tried it on my CX CAS 3.1, the program doesn't crash for me and it takes ~6 seconds = ~167 fps.
I don't know how to interpret it, I don't understand
Quote
giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously
???

Edit: Could you post a program that writes 1000 times to something else than the screen? I don't think it could make a difference, but just in case..
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on July 01, 2014, 07:34:17 am
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.

I'll do that in a minute.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on July 01, 2014, 07:35:09 am
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.

I'll do that in a minute.
Note that the 132 MHz program also ran at AHB = 33...
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on July 01, 2014, 07:36:25 am
Well yeah, at this point CPU speed isn't important because it's huge compared to AHB speed (only when it comes to writing memory and doing only that of course).
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: aeTIos on July 01, 2014, 07:39:07 am
What I mean to point out is that while you seem to think the newer one is better, I accidentally ran it at the wrong AHB and thus memory speed. This means the old version is no worse than the newest version. Tests that I ran at AHB = 66 MHz with the old version support this.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on July 01, 2014, 07:40:23 am
Hum yeah, well we'll see how it goes with writing to "normal" memory.
Title: Re: [Ndless] Help with bottleneck on color calcs only
Post by: Matrefeytontias on July 01, 2014, 07:47:26 am
Bump,

so this writes 1000 times a 320*240*2 bytes buffer in RAM to another similar buffer in RAM. Start timing when the screen goes black, one second after you ran the program.

http://www.mirari.fr/xKZq