PDA

View Full Version : Binary format G4/i386




rinseout
Jan 10, 2005, 09:17 PM
I'm starting to do some development work on my iBook; the code is just number-crunching code, and has to be portable between Linux (i386) and OS X.
(The target platform is actually Linux, but that's not where I'm doing the development).

Anyway, one of my programs does a bunch of calculations and then writes out binary format files that contain lots of numbers and some characters. (Other programs read these files in.)

If I run the same program on my Linux box and my iBook and md5sum them, I find the output files have different md5 fingerprints which is worrisome.

Is this just a symptom of the internal representation of numbers/characters being different on the two boxes? Or is it something potentially serious?



csubear
Jan 10, 2005, 09:36 PM
thats very strange. The file should be the same. How did you transfer the file to the linux box? ftp? if so make sure that you where not transferring in ASCI mode. it has to be in binary. In ASCI mode it may replace newlines and EOF.

whenpaulsparks
Jan 10, 2005, 10:23 PM
the only thing i can see is that it could be a big/little endian discrepancy, but i don't know if that affects binary files too, i think it does...

northen
Jan 11, 2005, 12:46 AM
the only thing i can see is that it could be a big/little endian discrepancy, but i don't know if that affects binary files too, i think it does...

It is most definitely an issue of endianness. The native byte order (although it can be changed) of the PowerPC processors is 4321, whereas it's typically 1234 on the IA-32 processors.

From my FreeBSD Box (Intel P4)
elysium# sysctl -a | grep byte
hw.byteorder: 1234

And from my PowerMac G5/Panther

boadicea:~ david$ sysctl -a | grep byte
hw.byteorder: 4321


You can, however reverse the effects of the difference by bitshifting in C, although it's a small bit of work :)

rinseout
Jan 11, 2005, 04:02 PM
Thanks for that. I don't care (too much) that these binary files aren't going to be portable, only that after compilation that the programs will work on their respective platforms.

Thank goodness it isn't something more serious!

rinseout
Jan 11, 2005, 07:05 PM
... so say I decided I actually do want to be able to use files generated on i386 machines to be useful on my iBook for testing purposes. Is it a big deal to write (C/C++) functions that do the necessary conversions? The reason I ask is that it isn't really practical to have my iBook do calculations that take a day or so on a Beowulf cluster, but it is practical to be able to access the results of those calculations.

As far as I can remember, the records in the files are just a bunch of fixed-length character array literals, signed and unsigned integers, maybe some long ints, and double-precision floats.

How much of a headache is this, and can you point me to any examples on how to effect this conversion?

Update: I have managed to find some examples on the web that refer to "dealing with some endian stuff", but I'd really be grateful for a pointer to how to do this properly. Things I'm a bit worried about are sizeof giving different values on PPC versus x86, and properly converting binary formatted x86 types (including floating point types) to their PPC equivalents, preferably maintaining NaN and over-/underflow if it's not too much of a headache.

It would be acceptable to anoint x86 format as "proper", and have code for compiled PPC versions to do the proper conversions on the I/O.

I would like to do this as properly as possible without spending more than a couple of days on it. It'd be great if a solution to doing this on 64 bit processors was part of it, but not necessary.

Thanks.

northen
Jan 12, 2005, 02:57 AM
... so say I decided I actually do want to be able to use files generated on i386 machines to be useful on my iBook for testing purposes. Is it a big deal to write (C/C++) functions that do the necessary conversions? The reason I ask is that it isn't really practical to have my iBook do calculations that take a day or so on a Beowulf cluster, but it is practical to be able to access the results of those calculations.

As far as I can remember, the records in the files are just a bunch of fixed-length character array literals, signed and unsigned integers, maybe some long ints, and double-precision floats.

How much of a headache is this, and can you point me to any examples on how to effect this conversion?

Update: I have managed to find some examples on the web that refer to "dealing with some endian stuff", but I'd really be grateful for a pointer to how to do this properly. Things I'm a bit worried about are sizeof giving different values on PPC versus x86, and properly converting binary formatted x86 types (including floating point types) to their PPC equivalents, preferably maintaining NaN and over-/underflow if it's not too much of a headache.

It would be acceptable to anoint x86 format as "proper", and have code for compiled PPC versions to do the proper conversions on the I/O.

I would like to do this as properly as possible without spending more than a couple of days on it. It'd be great if a solution to doing this on 64 bit processors was part of it, but not necessary.

Thanks.

I would recommend you use the function

unsigned long( unsigned long hostlong );

to do it before you transfer them to your Mac. It will convert Little-Endian to Big-Endian network order.

You could do a simple program:


#include <stdio.h>
#include <arpa/inet.h>
#include <unistd.h>

unsigned long int little_endian;
unsigned long int big_endian;

int main() {
FILE * input, * output;
input = fopen("input", "rb");
output = fopen("output", "wb");

while( !feof(input) ) {
fread( &little_endian, sizeof(unsigned long), 1, input);
big_endian = htonl(little_endian);
fwrite( &big_endian, sizeof(unsigned long), 1, output);
}

fclose(input);
fclose(output);

return 0;
}