Author Topic: [Ndless] Help with bottleneck on color calcs only  (Read 11846 times)

0 Members and 1 Guest are viewing this topic.

Offline DJ Omnimaga

  • Former TI programmer
  • CoT Emeritus
  • LV15 Omnimagician (Next: --)
  • *
  • Posts: 55918
  • Rating: +3152/-232
  • CodeWalrus founder & retired Omnimaga founder
    • View Profile
    • DJ Omnimaga Music
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #30 on: June 27, 2014, 06:17:10 pm »
Yeah but wouldn't it still be faster to paste the data on the grayscale screen since it only supports 4 bits, such as converting the data beforehand or stripping the extra data? Not sure how the grayscale Nspire screens work, though, so maybe I'm just misunderstanding something.

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1192
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #31 on: June 27, 2014, 06:17:36 pm »
I have a theory:
The frame buffer contents mustn't be cached, as that may cause artifacts. So caching is disabled by the OS.
On GS calcs the screen buffer is in SRAM, so SRAM caching is disabled. But it's active on SDRAM writes and reads.
On CX calcs the screen buffer is somewhere in SDRAM, and the buffer set by the OS has caching disabled. So writes and reads will be slower.
What happens if you allocate two buffers for both calcs so the default one is never used?

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #32 on: June 27, 2014, 06:20:15 pm »
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).

Vogtinator : I was wondering that. Will test when I can (probably in some minutes).

Offline Adriweb

  • Editor
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1708
  • Rating: +229/-17
    • View Profile
    • TI-Planet.org
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #33 on: June 27, 2014, 06:22:45 pm »
DJ : what you don't understand is that the program is already 6 times faster (according to latest aeTIos's tests) on GS calcs than on color calcs at merely the same frequency (120 MHz for GS vs 132 MHz for color).
In case you didn't see :
I decided to un-overclock my calc and re-run copyscreen1000times.12:53 seconds.

(Or well, I'm not sure where you see this '6x' faster :P)

GL&HF anyway ^^
My calculator programs
TI-Planet.org co-admin.
TI-Nspire Lua programming : Tutorials  |  API Documentation

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #34 on: June 27, 2014, 06:24:27 pm »
12 ≈ 6 * 2 if I still know how to count. aeTIos's tests are made on a color calc and give 12 seconds, pimath's tests are done on a GS calc and gives 2 seconds.

Offline aeTIos

  • Nonbinary computing specialist
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3913
  • Rating: +184/-32
    • View Profile
    • wank.party
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #35 on: June 27, 2014, 06:26:42 pm »
I don't know if pimaths calc is OCed though. Even then, the max speed doesn't go much over 150MHz. So yeah something really strainge is going on.
I'm not a nerd but I pretend:

Offline Adriweb

  • Editor
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1708
  • Rating: +229/-17
    • View Profile
    • TI-Planet.org
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #36 on: June 27, 2014, 06:32:10 pm »
Well, it's actually closer so 5 times, but OK, as long as tests are consistent it's good.
My calculator programs
TI-Planet.org co-admin.
TI-Nspire Lua programming : Tutorials  |  API Documentation

Offline pimathbrainiac

  • Occasionally I make projects
  • Members
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1731
  • Rating: +136/-23
  • dagaem
    • View Profile
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #37 on: June 27, 2014, 06:36:13 pm »
I have not OC'd. My calc came with a non-downgrade-able OS, so I updated to 3.6, then ndless'd
I am Bach.

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #38 on: June 27, 2014, 08:03:08 pm »
So I came up with this, which actually seems to give 167 FPS (I guess this is the maximum the color Nspire's LCD can give) at any frequency on a color calc ; giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously (all are aeTIos's tests).

Only problem, it crashes on exit (that is, after 6 seconds), and I have no idea why.

Code: [Select]
unsigned short *BUFF_BASE_ADDRESS;
void *SCREEN_BACKUP;

void initBuffering()
{
   void *temp;
   temp = malloc(BUFF_BYTES_SIZE);
   if(!temp)
      exit(0);
   BUFF_BASE_ADDRESS = (unsigned short*)malloc(BUFF_BYTES_SIZE);
   if(!BUFF_BASE_ADDRESS)
   {
      free(temp);
      exit(0);
   }
   
   SCREEN_BACKUP = *(void**)0xC0000010;
   
   // Handle monochrome screens-specific shit
   if(is_classic)
      *(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x08;
   
   *(void**)0xC0000010 = temp;
}

void updateScreen()
{
   // Screen-access delays make this the fastest method apparently
   memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
}

void deinitBuffering()
{
   void *temp = *(void**)0xC0000010;
   // Handle monochrome screens-specific shit again
   if(is_classic)
      *(int32_t*)0xC000001C = (*(int32_t*)0xC000001C & ~0x0e) | 0x04;
   *(void**)0xC0000010 = SCREEN_BACKUP;
   free(temp);
   free(BUFF_BASE_ADDRESS);
}

int main(void)
{
   int i;
   
   initBuffering();
   clearBufferB();
   
   for(i = 0; i < 1000; i++)
   {
      memcpy(*(void**)0xC0000010, BUFF_BASE_ADDRESS, BUFF_BYTES_SIZE);
   }
   
   deinitBuffering();
   return 0;
}

Binaries attached, but it does crash your calc.

EDIT : did some tests with the current version of nKaruga (source accessible from github). The game runs at 95 FPS on my 120 MHz grayscale TI-Nspire CAS with Ndless 3.1 r914, calculated by seeing that you take 3.4 seconds to cross 320 pixels by moving one pixel by one pixel.
« Last Edit: June 27, 2014, 08:14:17 pm by Matrefeytontias »

Offline Streetwalrus

  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3821
  • Rating: +80/-8
    • View Profile
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #39 on: June 28, 2014, 03:42:01 am »
I did one last experiment today. I was getting the same results as aeTIos. I tried with CPU at 246MHz and AHB at 35MMHz, the thing crashes after 12 seconds, which makes it even clearer that it's memory that causes the bottleneck.
aeTIos tried different scaling modes in gpSP and unscaled is much faster, not only because of the scaling but also for the same reason.
« Last Edit: June 28, 2014, 04:01:03 am by Streetwalrus »

Offline aeTIos

  • Nonbinary computing specialist
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3913
  • Rating: +184/-32
    • View Profile
    • wank.party
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #40 on: June 28, 2014, 04:05:59 am »
I have not yet tested downclocking my AHB in gpSP-nspire, I'm doing that right now.
I'm not a nerd but I pretend:

Offline Vogtinator

  • LV9 Veteran (Next: 1337)
  • *********
  • Posts: 1192
  • Rating: +108/-5
  • Instruction counter
    • View Profile
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #41 on: June 30, 2014, 05:12:21 pm »
I just tried it on my CX CAS 3.1, the program doesn't crash for me and it takes ~6 seconds = ~167 fps.
I don't know how to interpret it, I don't understand
Quote
giving a mere 6 seconds at 132 MHz as well as 262 MHz, vs 12.58 seconds at 132 MHz and a bit more than 6 seconds at 262 MHz previously
???

Edit: Could you post a program that writes 1000 times to something else than the screen? I don't think it could make a difference, but just in case..
« Last Edit: June 30, 2014, 05:17:30 pm by Vogtinator »

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #42 on: July 01, 2014, 07:34:17 am »
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.

I'll do that in a minute.

Offline aeTIos

  • Nonbinary computing specialist
  • LV12 Extreme Poster (Next: 5000)
  • ************
  • Posts: 3913
  • Rating: +184/-32
    • View Profile
    • wank.party
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #43 on: July 01, 2014, 07:35:09 am »
I meant that with this version, the program executes in 6 seconds whether the calc has been clocked to 132 or 262 MHz, whereas with the previous version, the program ran in 12.58 seconds when the calc was clocked to 132 MHz and in 6 seconds when it was clocked at 262 MHz. So the newest version is better.

I'll do that in a minute.
Note that the 132 MHz program also ran at AHB = 33...
I'm not a nerd but I pretend:

Offline Matrefeytontias

  • Axe roxxor (kinda)
  • LV10 31337 u53r (Next: 2000)
  • **********
  • Posts: 1982
  • Rating: +310/-12
  • Axe roxxor
    • View Profile
    • RMV Pixel Engineers
Re: [Ndless] Help with bottleneck on color calcs only
« Reply #44 on: July 01, 2014, 07:36:25 am »
Well yeah, at this point CPU speed isn't important because it's huge compared to AHB speed (only when it comes to writing memory and doing only that of course).