Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - christop

Pages: [1]
1
Other Calc-Related Projects and Ideas / TI netcat
« on: August 16, 2012, 03:35:46 pm »
(Cross-posted from cemetech for those who haven't seen it there)

I'm working on a project called "tinc", short for "TI netcat". My original plan was to write a program that can talk to a calculator and can either connect or listen for connections over the network. The current incarnation doesn't speak over the network but rather only over stdin/stdout, but it can still speak over the network with the help of nc (or perhaps even inetd). If I decide not to add networking functionality to tinc, I'll probably rename the project to "ticat".

I've been working with Lionel Debroux and Benjamin Moody to make it work with ticables reliably. Now it's at a point where it should be pretty stable (but it's still not ready for a real release yet). I've only tested it with TiEmu and Punix (not that the OS should make a difference).

Get tinc here: https://github.com/abbrev/tinc

Anyway, here's a little networked tinc demo (using nc for the network part):
Code: [Select]
$ mkfifo pipe
$ <pipe nc -l 8002 | ./tinc -c tiemu >pipe
In Punix on the calculator (TiEmu) I ran the "uterm" program, which  is a terminal program that talks over the link port (kind of like minicom in *nix or Hyperterminal in Windows). Then I browsed to http://localhost:8002/ in Firefox.

In the following screenshot, the first 9 lines (11 lines with line wrapping) starting at "GET / HTTP/1.1" are the HTTP request from Firefox. The next 6 lines (starting at "HTTP/1.1 200") are the HTTP reply that I manually typed on the calculator. Firefox displayed the text "Hello world!" in plain text as soon as I hit enter on the last line. Hooray for an impractical web server. :)




Another use for tinc is to use the Punix shell from a real terminal. This works only if Punix is listening for logins over the link port, which is not the case with the demo above (the link port can be opened only once at a time in Punix).

Anyway, here's a short run using the Punix shell in my Linux terminal:
Code: [Select]
$ ./tinc -c tiemu

server login: root
password:
stupid shell v0.2
root@server:~# help
available applets:
 tests     top       cat       echo      true      false     clear     uname
 env       id        pause     batt      date      adjtime   malloc    pid
 pgrp      poweroff  times     sysctltest ps        bt        crash     mul
 div       kill      time      exit      status    help
root@server:~# echo hello world!
hello world!
root@server:~# exit

server login:
This was actually my first use of tinc, but I think the HTTP demo is more impressive. :)

I hope I can get tinc ready for release before too long, and I hope we all can find useful and innovative uses for tinc. I already thought about writing a VNC server for Punix as a sort of replacement for the TI screenshot functionality. Who knows what else we can come up with?

2
ASM / 68k 64-bit multiplication
« on: August 03, 2011, 01:47:29 am »
Hi all! I am in search of a fast routine that can multiply two 64-bit unsigned integers and return a 128-bit unsigned integer. Preferably the inputs are in %d0:%d1 and %d2:%d3, and the output is in %d0:%d1:%d2:%d3, but I'll take what I can get if the order is different (perhaps the product can be in %d4:%d5:%d6:%d7).

Does anyone know where I can find such a routine?

Another option is a routine that can multiply two 1.63 fixed-point values and return a 2.62 fixed-point product (or, preferably, normalized to 1.63). Such a routine might be faster than shifting a 128-bit product and dropping the lower 64 bits, as there may be no need to calculate all of the low-order bits--just the ones that influence the upper 64 bits.

On a somewhat related topic, I also need a 64-bit division routine (64 / 64 => 64), also normalized. :)

In case you're curious what I need this for, I'm writing a floating-point library which uses 64-bit significands, and I have to multiply these significands for the multiplication floating-point operation. The significand can be thought of as a normalized 1.63 fixed-point value, which means the high bit (the integer part) is always 1. As such, the product "p" of two 1.63 fixed-point values (a 2.62 fixed-point value) must be 1.0 <= p < 4.0.

3
ASM / Shifting 64-bit numbers (68k)
« on: February 27, 2011, 10:15:50 pm »
Ok, so I know most people here use Z80, but here is a set of 68k routines for shifting unsigned 64-bit numbers left and right. I wrote these for a floating-point emulator that I'm writing, but they're useful for other stuff too.

Code: [Select]
.equ SHIFT_THRESH, 5 | XXX the exact optimal value needs to be figured out

| unsigned long long sr64(unsigned long long, unsigned);
sr64:
        move.l  (4,%sp),%d0
        move.l  (8,%sp),%d1
        move.w  (12,%sp),%d2

| shift right a 64-bit number
| input:
|  %d0:%d1 = 64-bit number (%d0 is upper 32 bits)
|  %d2.w = shift amount (unsigned)
| output:
|  %d0:%d1, shifted
sr64_reg:
        cmp     #SHIFT_THRESH,%d2
        blo     8f
        cmp     #32,%d2
        bhs     5f
        ror.l   %d2,%d0
        lsr.l   %d2,%d1         | 00..xx (upper bits cleared)

        move.l  %d3,-(%sp)

        | compute masks
        moveq   #-1,%d3
        lsr.l   %d2,%d3         | 00..11 (lower bits)
        move.l  %d3,%d2
        not.l   %d2             | 11..00 (upper bits)

        and.l   %d0,%d2         | only upper bits from %d0
        or.l    %d2,%d1         | put upper bits from %d0 into %d1
        and.l   %d3,%d0         | clear upper bits in %d0

        move.l  (%sp)+,%d3
        rts

        | shift amount is >= 32
5:
        cmp     #64,%d2
        bhs     6f
        sub     #32,%d2
        move.l  %d0,%d1
        lsr.l   %d2,%d1
        moveq   #0,%d0
        rts

        | shift amount < threshold
7:
        | shift right one bit
        lsr.l   #1,%d0
        roxr.l  #1,%d1
8:
        dbra    %d2,7b
        rts

| unsigned long long sl64(unsigned long long, unsigned);
sl64:
        move.l  (4,%sp),%d0
        move.l  (8,%sp),%d1
        move.w  (12,%sp),%d2

| shift left a 64-bit number
| input:
|  %d0:%d1 = 64-bit number (%d0 is upper 32 bits)
|  %d2.w = shift amount (unsigned)
| output:
|  %d0:%d1, shifted
sl64_reg:
        cmp     #SHIFT_THRESH,%d2
        blo     8f
        cmp     #32,%d2
        bhs     5f
        rol.l   %d2,%d1
        lsl.l   %d2,%d0         | xx..00 (lower bits cleared)

        move.l  %d3,-(%sp)

        | compute masks
        moveq   #-1,%d3         | mask
        lsl.l   %d2,%d3         | 11..00 (upper bits)
        move.l  %d3,%d2
        not.l   %d2             | 00..11 (lower bits)

        and.l   %d1,%d2         | only lower bits from %d1
        or.l    %d2,%d0         | put lower bits from %d1 into %d0
        and.l   %d3,%d1         | clear lower bits in %d1

        move.l  (%sp)+,%d3
        rts

        | shift amount is >= 32
5:
        cmp     #64,%d2
        bhs     6f
        sub     #32,%d2
        move.l  %d1,%d0
        lsl.l   %d2,%d0
        moveq   #0,%d1
        rts

        | shift amount is >= 64
6:
        moveq.l #0,%d0
        move.l  %d0,%d1
        rts

        | shift amount < threshold
7:
        | shift left one bit
        lsl.l   #1,%d1
        roxl.l  #1,%d0
8:
        dbra    %d2,7b
        rts

These routines can be used from assembly (using the _reg versions) or from C (using the non-_reg versions).

Also, if most shifts in your program are more than the threshold (SHIFT_THRESH), you can remove the 2 instructions at the beginning and 4 lines at the bottom of both routines (in the _reg versions, that is). That would save a bit of time (for most cases) and some space.

Without the threshold sections, these routines seem to be smaller (and probably faster) than the assembly code generated by TIGCC for shifting 64-bit numbers ("long long" type). My code does check for shifts greater than 64, whereas TIGCC doesn't (shifts greater than the width of the type produce undefined results in C anyway, but I wanted defined behavior in my code).

I'll double-check the sizes and timings in mine relative to the generated code and then bring this to the TIGCC maintainers' attention if mine is better. Smaller and faster code is always a good thing, right? :D

4
TI 68K / Punix
« on: February 26, 2011, 02:19:01 pm »
Recently I started experimenting with floating-point emulation on the 68K calculators, specifically in my own OS, Punix. (I'm new to this forum, and I haven't talked about Punix lately in other TI forums either, so for those who don't know what Punix is, it's a Unix-like OS--complete with features like preemptive multitasking and a VT220 terminal emulator--that I'm writing for the TI-92+ and maybe the TI-89)

Here's a little background on floating-point emulation. First of all, the 68000 processor has no support for FP hardware (such as the 68881 floating-point unit, or FPU). Only the later CPU's in the family (68020, '40, '60) do. However, the 68000 does have an exception vector (the so-called "F-line" or "Line 1111 emulator" exception) which allows for software FP operations. This is basically an FPU emulator, and this is what I have done. Needless to say, the software FP emulation is far slower than a real FPU by about an order of magnitude in the best case.

The 68881 and 68882 are the FPU's of choice for the 68020 (and also for the FPU-less versions of the '40), so this is what I chose to emulate. This FPU supports several integer and floating-point formats (byte integer, word integer, long integer, single-precision float, double-precision float, extended-precision float, and packed decimal float) and several addressing modes (AFAICT, the same modes as the 68000). The internal FP registers of the 6888x FPU's are 80-bit extended-precision (but stored in 96 bits), as described on this page: http://en.wikipedia.org/wiki/Extended_precision

By contrast, the TI-AMS on the 68k calcs uses an 80-bit SMAP II BCD floating-point format, which has less precision, a smaller range, and is slower to calculate than extended precision. (Single precision is smaller and faster than the others, but it also has the least precision and smallest range.)


Anyway, after a few hours of work, I implemented basic instruction decoding and the execution of a few simple FP instructions, notably FTST, FNEG, and FBcc (which is analogous to the regular Bcc instructions). Here is some demo code showing what my FPU emulator can handle so far:

Code: [Select]
| print the FP type using the current FP condition codes
printfptype:
movem.l %a0-%a3,-(%sp)

| test for nan
fbngle  1f | NGLE = not greater, less, or equal

| test for zero/not zero
fbneq   2f | NEQ = not equal
pea.l   4f      | 0
bra     3f
2: pea.l   5f      | x
0: bra     3f

1: pea.l   9f      | nan

| test for sign (doesn't recognize negative zero)
3: fblt    2f      | LT = less than
pea.l   7f      | +
bra     0f
2: pea.l   6f      | -
0: pea.l   8f

jbsr    printf
lea.l   (3*4,%sp),%sp

movem.l (%sp)+,%a0-%a3
rts

4: .asciz "0"
5: .asciz "x"
6: .asciz "-"
7: .asciz "+"
8: .asciz "%s%s\n"
9: .asciz "nan"

| here is the main entry point for the demo
| fputest() is called from a user-space driver program
.global fputest
fputest:
move    #-5,%d0

| %Dn.b
fneg.b  %d0,%fp0
bsr     printfptype

| (d16,%pc)
fneg.x  (11f,%pc),%fp1
bsr     printfptype

| (An)
lea     11f,%a3
fneg.x  (%a3),%fp2
bsr     printfptype

| (An)+
fneg.x  (%a3)+,%fp3
bsr     printfptype
fneg.x  (%a3)+,%fp4
bsr     printfptype
fneg.x  (%a3)+,%fp5
bsr     printfptype
fneg.l  (%a3)+,%fp6
bsr     printfptype
fneg.l  (%a3)+,%fp7
bsr     printfptype
fneg.w  (%a3)+,%fp0
bsr     printfptype
fneg.w  (%a3)+,%fp1
bsr     printfptype
fneg.b  (%a3)+,%fp2
bsr     printfptype
fneg.b  (%a3)+,%fp3
bsr     printfptype

| -(An)
fneg.b  -(%a3),%fp4
bsr     printfptype
fneg.b  -(%a3),%fp5
bsr     printfptype
fneg.w  -(%a3),%fp6
bsr     printfptype
fneg.w  -(%a3),%fp7
bsr     printfptype
fneg.l  -(%a3),%fp0
bsr     printfptype
fneg.l  -(%a3),%fp1
bsr     printfptype

rts
11: .long 0x7fffffff,0x40000000,0x00000000  | nan
12: .long 0x7fffffff,0x00000000,0x00000000  | +inf
.long 0xffffffff,0x00000000,0x00000000  | -inf
13: .long 7
.long -7
.word 5
.word 0
.byte 3
.byte 0

This code tests a few different addressing modes (register, indirect, address register indirect, address register indirect with pre-decrement/post-increment) and integer/float formats (extended-precision, long, word, byte). My emulator cannot handle single- or double-precision or packed decimal formats yet, so I haven't tested those.

The output of this code is "-x" for a negative number, "+x" for a positive number, "+0" for zero (the test code doesn't differentiate between positive and negative zero), or "+nan" for NaN.

Here is a screenshot of this demo code in action:



(You may notice that it displays "+x" for both infinities, whereas it should display "-x" for negative infinity. I already fixed this bug but didn't update the screenshot.)

Each emulated instruction takes between about 600 and 1400 cycles, which is between 8.5 and 20 KFLOPS (kilo floating-point operations per second) with a 12MHz clock rate. According to http://en.wikipedia.org/wiki/Motorola_68881 the 68881 runs at 160 KFLOPS at 16MHz, which would be about 120 KFLOPS at 12MHz. As you can see, this FPU emulator is about 1/10 as fast as a real FPU. (Take these with a grain of salt as I haven't really started on the more computationally expensive operations like multiplication and division).

Since this demo only tests simple operations (fneg is basically just flipping a single bit), most of the time in these simple FP operations is spent in decoding the instruction itself. I might also make the bare floating-point routines available to user programs (so they can do, eg, "bsr fneg" to negate (%a0) and put the result in (%a1)) to remove the instruction decoding time, but for now I'll stick with this FPU emulation.

Edit: fixed note about the bug regarding +/- infinity. I was off by one looking at the display.
Edit again: negative infinity bug is squished! I didn't check the sign for infinity and nan in the FTST code (which FNEG does implicitly)
Edit thrice: I just realized that an extended-precision value with a maximum exponent (0x7fff) is a NAN only if the fraction (excluding the MSB of the fraction) is non-zero. If only the MSB of the fraction is set, it's an infinity, not a NAN. I fixed my code to reflect this correction. Read this for (many) more details: http://www.freescale.com/files/archives/doc/ref_manual/M68000PRM.pdf

Pages: [1]