PDA

View Full Version : binary file, low level byte code




MrFusion
Jun 26, 2008, 02:46 AM
Hello,

I have a binary file, and I also know what each byte represents.
I can read the file, convert it to bytes and acces each byte separately.

//retrieve data
NSData *data = [NSData dataWithContentsOfFile:@"filename.ext"];
const unsigned char *bytes = [data bytes];
//header
int j;
for (j=0; j <99;j++) {
NSLog(@"%i %i _%c_",j, (int)bytes[j],(int)bytes[j]);
}


There are several blocks of bytes which should be converted to meaningful data, some blocks are strings, some are floats, etc.
I am looking for functions to help with this. More specifically how to copy part of this bytes array into a new smaller array.
So far, I came across methods to add new bits to an array, or how to copy the array completely.

Converting to a string should not be a problem. NSString has a function to convert bytes to a string:
+ (id)stringWithCharacters:(const unichar *)chars length:(unsigned)length

Converting to a float is probable also not difficult, I guess.

Also, I am a bit concerned about memory management.
If I start making many const char arrays, how will this affect good memory management.
I am more used to the higher level cocoa code: init, alloc, release.

Thanks!



Enuratique
Jun 26, 2008, 03:35 AM
Hello,

I have a binary file, and I also know what each byte represents.
I can read the file, convert it to bytes and acces each byte separately.

//retrieve data
NSData *data = [NSData dataWithContentsOfFile:@"filename.ext"];
const unsigned char *bytes = [data bytes];
//header
int j;
for (j=0; j <99;j++) {
NSLog(@"%i %i _%c_",j, (int)bytes[j],(int)bytes[j]);
}


There are several blocks of bytes which should be converted to meaningful data, some blocks are strings, some are floats, etc.
I am looking for functions to help with this. More specifically how to copy part of this bytes array into a new smaller array.
So far, I came across methods to add new bits to an array, or how to copy the array completely.

Converting to a string should not be a problem. NSString has a function to convert bytes to a string:
+ (id)stringWithCharacters:(const unichar *)chars length:(unsigned)length

Converting to a float is probable also not difficult, I guess.

Also, I am a bit concerned about memory management.
If I start making many const char arrays, how will this affect good memory management.
I am more used to the higher level cocoa code: init, alloc, release.

Thanks!

It's a good thing to be prudent about memory management. This article (http://www.stepwise.com/Articles/Technical/MemoryManagement.html) may help (I don't know what your level of understanding is) but the rule of thumb is any Framework method that is static but returns a pointer has had autorelease called on it - meaning as long as you don't need that reference to be valid for a long time (longer than say 10 seconds for example) then it will automatically be released for you. If you do need it longer than that, you should call retain on it as soon as you are given it and then release it when it's no longer needed. Of course, as you know, any alloc / init call you make yourself you are responsible for releasing.

In the sample code you provided, when you get a const unsigned char byte * array, this has no memory implications on you since you're just getting a reference to what's in the NSData set, which will be autoreleased since you're using a static method to populate it with bytes.

To answer your second question: if you know what each byte means, and if a byte is supposed to represent a number, I must caution you that hopefully none of your numbers are larger than 255 since that's as big as 1 byte will hold. If none of your numbers you expect are larger than 255, you can simply cast the byte to an int or a float and it should just work. If, on the other hand, a true int was written to the file as 4 sequential bytes, you'll have to perform the same kind of voodoo this guy (http://www.astahost.com/info.php/convert-byte-array-int-vice-versa_t344.html) is doing (basically bit shifting coupled with pointer math to get the desired result).

I can't make a guarantee that casting a float will just work because how a float is stored isn't very straightforward. There's a whole IEEE standard on what each bit in the 4 bytes mean which makes me feel like casting an 8 bit byte to a 32 bit may cause problems. I should also caution you that the iPhone hardware is not optimized to handle floats... If you can, try to use doubles as floating point operations (addition, multiplication, division, etc) take a grossly disproportionate amount of time to execute compared to the double. This is because the hardware doesn't natively support them so the operation has to be broken down into multiple simpler add/multiply/divide/etc operations using ints and what not - something the hardware can do natively.

HTH,

Enuratique

MrFusion
Jun 26, 2008, 04:18 AM
It's a good thing to be prudent about memory management. This article (http://www.stepwise.com/Articles/Technical/MemoryManagement.html) may help (I don't know what your level of understanding is) but the rule of thumb is any Framework method that is static but returns a pointer has had autorelease called on it - meaning as long as you don't need that reference to be valid for a long time (longer than say 10 seconds for example) then it will automatically be released for you. If you do need it longer than that, you should call retain on it as soon as you are given it and then release it when it's no longer needed. Of course, as you know, any alloc / init call you make yourself you are responsible for releasing.

In the sample code you provided, when you get a const unsigned char byte * array, this has no memory implications on you since you're just getting a reference to what's in the NSData set, which will be autoreleased since you're using a static method to populate it with bytes.

To answer your second question: if you know what each byte means, and if a byte is supposed to represent a number, I must caution you that hopefully none of your numbers are larger than 255 since that's as big as 1 byte will hold. If none of your numbers you expect are larger than 255, you can simply cast the byte to an int or a float and it should just work. If, on the other hand, a true int was written to the file as 4 sequential bytes, you'll have to perform the same kind of voodoo this guy (http://www.astahost.com/info.php/convert-byte-array-int-vice-versa_t344.html) is doing (basically bit shifting coupled with pointer math to get the desired result).

I can't make a guarantee that casting a float will just work because how a float is stored isn't very straightforward. There's a whole IEEE standard on what each bit in the 4 bytes mean which makes me feel like casting an 8 bit byte to a 32 bit may cause problems. I should also caution you that the iPhone hardware is not optimized to handle floats... If you can, try to use doubles as floating point operations (addition, multiplication, division, etc) take a grossly disproportionate amount of time to execute compared to the double. This is because the hardware doesn't natively support them so the operation has to be broken down into multiple simpler add/multiply/divide/etc operations using ints and what not - something the hardware can do natively.

HTH,

Enuratique

Thanks for the answer! I now see that I don't have to worry about the const unsigned char byte * array. I also saw that NSData has a function for getting a subrange of the data set. I can now get going in converting the binary file.

I haven't looked into the c functions yet for converting bytes into floats, but I did something similar with java many summers ago. So, yes you are right it's not as easy as just casting a byte into a float. Thanks for the link, it saved me a ton of time.

No need to worry about the iPhone. I haven't jumped on that bandwagon yet.

lee1210
Jun 26, 2008, 10:15 AM
A word of warning regarding this sort of activity:
You need to be aware of endianness in this mixed-architecture world. If you will have users that span Intel macs and either PowerPC or iPhone, this matters. You could always store the binary file with native endianness, but that means it can't be copied from a machine of one architecture to another with a different architecture.

I would say that you could just have a "is_little_endian" method that you use to set a global variable that is used in your "float_from_bytes", "int_from_bytes", etc. routines. It would go something like this:
int is_le;
int is_little_endian() {
char bytes[4] = { 0, 0, 0, 0};
int *i
i = (int *) bytes;
bytes[0] = 1;
if (*i > 1) {
is_le = 0;
} else {
is_le = 1;
}
}

You would have to call that in an initialization routine, then you could refer to is_le when you are doing your conversions from bytes.

From that point, you could probably get away with calling -getBytes:range: on your NSData object, giving a range for each type. You could store that as character data, then do the swap if necessary (you'll have to choose an endianness for file storage, and swap when is_le indicates the opposite endianness).

You can do the swap by either manipulating the character array then doing a memcpy into the target variable, or reading the value into your type, say, int, then setting a char * to the pointer to this int, and swapping the bytes that way. In the case of character strings no swapping is necessary, but if the string is UTF-8 you'll probably need to get it into a char *, null terminate it, then instantiate an NSString from there.

I would stick to getting base types first, then wrapping them in NSNumber, as I don't see NSNumber methods like "initWithBytesAsInt" or anything like that.

Good luck!

-Lee

MrFusion
Jun 26, 2008, 11:25 AM
A word of warning regarding this sort of activity:
You need to be aware of endianness in this mixed-architecture world. If you will have users that span Intel macs and either PowerPC or iPhone, this matters. You could always store the binary file with native endianness, but that means it can't be copied from a machine of one architecture to another with a different architecture.

I would say that you could just have a "is_little_endian" method that you use to set a global variable that is used in your "float_from_bytes", "int_from_bytes", etc. routines. It would go something like this:
int is_le;
int is_little_endian() {
char bytes[4] = { 0, 0, 0, 0};
int *i
i = (int *) bytes;
bytes[0] = 1;
if (*i > 1) {
is_le = 0;
} else {
is_le = 1;
}
}

You would have to call that in an initialization routine, then you could refer to is_le when you are doing your conversions from bytes.

From that point, you could probably get away with calling -getBytes:range: on your NSData object, giving a range for each type. You could store that as character data, then do the swap if necessary (you'll have to choose an endianness for file storage, and swap when is_le indicates the opposite endianness).

You can do the swap by either manipulating the character array then doing a memcpy into the target variable, or reading the value into your type, say, int, then setting a char * to the pointer to this int, and swapping the bytes that way. In the case of character strings no swapping is necessary, but if the string is UTF-8 you'll probably need to get it into a char *, null terminate it, then instantiate an NSString from there.

I would stick to getting base types first, then wrapping them in NSNumber, as I don't see NSNumber methods like "initWithBytesAsInt" or anything like that.

Good luck!

-Lee

Thanks for the warning.
I will probable be the only one using this software, though.
The software that created the binary file, has limited export functions. I rather make my own quick&dirty routine and save myself a few thousands mouse clicks to export the data into an ascii file.

MrFusion
Jul 22, 2008, 09:12 AM
Hello,

To come back to this. I figured out what most of the bytes in my binary file mean, since I have the documentation.

I have each time a set of 4 bytes, which I feed into this function to get a float out of it.


...
[MyBytes convertToFloat:[data subdataWithRange:NSMakeRange(position,4)]];
...



+(NSNumber *) convertToFloat:(NSData *)data {
const unsigned char *bytes = [data bytes];
float getal = (float)(*(bytes + 3) << 24 | *(bytes + 2) << 16 | *(bytes + 1) << 8 | *bytes);
// float getal = (float)(*(bytes) << 24 | *(bytes + 1) << 16 | *(bytes + 2) << 8 | *(bytes + 3));
return [NSNumber numberWithFloat:getal];
}


If I print the first set of bytes as int's to the screen, I get these values: 246 - 15 - 115 - 44.
This should get converted into on of these values:
3.45412E-012
1.43389E-013
3.61861E-012
7.07753E-014
with the first value as the most likely candidate.
However I get either 7.457382e+08 or -1.667596e+08 (big vs little endian).

I guess converting to float did not turn out to be as easy as I thought it would be.

Any suggestions on how to proceed best, or can you spot an error?

Thanks

lazydog
Jul 22, 2008, 09:49 AM
Hi

What do you get if you do this:-


const unsigned char *bytes = [data bytes];
float getal = *( (float *) bytes ) ;


b e n

lee1210
Jul 22, 2008, 12:12 PM
What about this:
+(NSNumber *) convertToFloat:(NSData *)data {
const unsigned char *bytes = [data bytes];
float getal = 0.;
memcpy(&getal,bytes,4);
return [NSNumber numberWithFloat:getal];
}
?

The bytes you sent are correct for 3.45412E-012. I find this page handy:
http://babbage.cs.qc.edu/IEEE-754/

-Lee

MrFusion
Jul 23, 2008, 08:45 AM
What about this:
+(NSNumber *) convertToFloat:(NSData *)data {
const unsigned char *bytes = [data bytes];
float getal = 0.;
memcpy(&getal,bytes,4);
return [NSNumber numberWithFloat:getal];
}
?

The bytes you sent are correct for 3.45412E-012. I find this page handy:
http://babbage.cs.qc.edu/IEEE-754/

-Lee

Both yours and Ben's solution work. Thanks!