From what i recall Jim's mapper prerotates the tileset for speed. It would require RAM proportional to how many tiles you had in this case.
Im sure i have it at home, ill look when i finish work.
It works on the same principal as DWedit's mapper, but Jim added layers etc.
EDIT - Didnt calc84maniac write a layered mapper as well?
In the past ive used a 2KB LUT to save on having to rotate each tile byte, basically each combination of 0-255 prerotated. Then just mask out the bits you dont need when copying. You could save on the masking with a 4KB LUT but that might be overkill for the benefit.



.