Omnimaga

Calculator Community => TI Calculators => Calculator C => Topic started by: TC01 on August 16, 2010, 01:03:11 pm

Title: Issues with \0 character
Post by: TC01 on August 16, 2010, 01:03:11 pm: This morning, I checked if Solar89 can properly handle two-byte tokens (by adding the Matrix tokens). Most of the time it works... but sometimes it doesn't, and I'm not sure how I can fix it.

Here's a situation that would cause a problem: trying to run a text file containing "[A]", or the matrix A token, through Solar89. Why? The hex code for this token is 5C00h.

The way Solar89 is programmed, each line of the text file is looked at individually, and then the hex code for the token is added to an unsigned character array (since unsigned char = 8 bits). So if the line contains the token ClrHome, the program will add the character E1h to the array, then add 3Fh to the array to finish it off. A two-byte token is handled by splitting the hex code into two characters and adding them both to the array. So for the token [C] (5C02h), the character 5Ch and then the character 02h.

The problem? For any token that has a byte of 00 (there aren't too many, but they include [A]), the \0 character is what will be added to that character array. And that seems to prevent me from adding anything else to the array. So if I'm trying to tokenize [A], no end-of-line character will be added, and anything after [A] on that line won't be added either.

But that's not the main problem. The main problem is that the code that saves the token array to a file does it by writing each individual character to the file, and it stops before it reaches \0. So even for a file only containing [A], I won't get 5Ch 00h 3Fh (3Fh being the hard return, the newline character), I'll just get 5Ch.

Fortunately, only a few tokens have a byte of 00h, and probably a lot of programs can be written without using them. But it would be nice to support them, but I'm not really sure how. Would I need to use something other than an unsigned character array? Or can I implement some workaround?
Title: Re: Issues with \0 character
Post by: TravisE on August 16, 2010, 03:50:52 pm: I don't know what code you're using to write the arrays, but in C many of the string-manipulation library routines treat byte 00 as indicating the end of a string. If you're using any of those, you'll probably want to switch to using the functions that instead write x bytes instead of using \0 as a terminator.
Title: Re: Issues with \0 character
Post by: Netham45 on August 16, 2010, 03:59:49 pm: 00 is the standard for the end of an array. you could either check for it and replace it with something like FF, or, as previously suggested, use a byte count array instead of a null terminated one.
Title: Re: Issues with \0 character
Post by: TC01 on August 16, 2010, 04:06:13 pm: I do use strncat for part of this... I have a function, tokenizeString, that is given the string, an unsigned character array, and the array of tokens. It returns the number of bytes that need to be appended to the output array, and then I use strncat to do just that:

Code: [Select]
numswaps = tokenizeString(line, tokline, tokens); output = strncat(output, tokline, numswaps);
So, I should be using memmove (or memcpy?) instead here?
Title: Re: Issues with \0 character
Post by: calc84maniac on August 16, 2010, 04:14:21 pm: Yeah, I think using an array of bytes instead of a string would work best here. Remember that you'll need to keep track of the size manually though.
Title: Re: Issues with \0 character
Post by: TravisE on August 16, 2010, 04:21:27 pm: Yes, strncat assumes 00h is a terminator. It does take a length as a parameter, but this is only used to ensure that it doesn't overflow the destination buffer (it will just truncate the string if it would go over). memcpy should do what you want—copying the actual number of characters you pass to it without caring what they are. As mentioned, you'll need an extra variable of some sort to keep track of the actual size of the data in this case.
Title: Re: Issues with \0 character
Post by: TC01 on August 16, 2010, 04:26:46 pm: Is there a reason to use memcpy over memmove for this case? memmove is working at the moment- I used it because the documentation says it can deal with it when the source and destination overlap and memcpy can't.

But thanks for the help, everyone, it's working now.
Title: Re: Issues with \0 character
Post by: TravisE on August 16, 2010, 04:31:03 pm: My guess is that memcpy is smaller or faster in your program if you're absolutely sure that the source and destination regions to be copied will never overlap in memory (such as when you have two separate buffers for source and destination). If they can overlap (like when you insert/delete bytes in the same buffer and want to shift everything after up or down), then memmove should be used instead.
Title: Re: Issues with \0 character
Post by: TC01 on August 20, 2010, 12:03:02 pm: Well, memcpy isn't exactly working, because it doesn't append the bytes to the array like strncat does. This means that only the last line of a program will be written to a file.

Is there a way to do this, or would I have to do it manually using a for loop of some sort?

EDIT: Well, I got it working using a for loop that increments the pointer, so I guess I've answered my own question for once.

Code: [Select]
numswaps = tokenizeString(line, tokline, tokens); for (i = 0; i < numswaps; ++i) *(output++) = tokline[i];