PDA

View Full Version : how the size of an integer is decided?




celia
Jun 26, 2007, 12:30 AM
Hi,

how the size of an integer is decided?
- is it based on processor or compiler or OS?


Thanks.



gauchogolfer
Jun 26, 2007, 12:33 AM
Do you want to know what determines the maximum size of permitted integers (i.e. 32-bit versus 64-bit)?

garethlewis2
Jun 26, 2007, 02:22 AM
For Java, it is built into the runtime. That is always 32-bit.

For OS X and Windows, if memory serves me correctly, it is defined in types.h which specifies how much memory is allocated for each type. It was done in terms of char which was defined as

#define char 1

So an int used to be defined as

#define int char*4

Probably no longer determined in terms of char, as what is a char? It is 8 bits, 16 bits, or is it UTF8 which can be any number of bits. w_tchar is 16 bits, but isn't set in stone.

celia
Jun 26, 2007, 04:44 AM
#define char 1

So an int used to be defined as

#define int char*4

So it is not depending on the processor(32 / 64 bit)?

cblackburn
Jun 26, 2007, 04:50 AM
So it is not depending on the processor(32 / 64 bit)?

Yes and no. C is/was designed to be a portable language so you could take your programs and run them on another computer easily. Hence it will make the size of an int whatever you tell it, however if you set an int to be 64 bit on a 32 bit machine then the computer has to break the the 64 bit word down into two 32 bit words, perform the operation on both parts sequentially, then stich it back together which is a *lot* slower than just doing it on a 32 bit integer.

Hence, usually, the size of an int is set to whatever the size of the processor ALU is, however this is purely a performance choice

Chris

Krevnik
Jun 26, 2007, 02:47 PM
Yes and no. C is/was designed to be a portable language so you could take your programs and run them on another computer easily. Hence it will make the size of an int whatever you tell it... <snip>

Uhm, this is a bit strange, since int is defined as a language keyword in C. It is defined by the compiler, not headers. Hence why you could (in theory) write an app with no headers if all it did was return a code based on some arguments/etc. (since main is supposed to return an int, but can return void)

The compiler defines int as the ALU size (processor bit-size), and I believe if you read K&R C and the ANSI C spec, this is the intended design. You /can/ override it with custom compilers, but then your compiler doesn't adhere to either C standard if you do.

In a C program, if you need to know the size of an int, you can use the sizeof(int) expression to do it (if for some reason you run on multiple architectures). Most platform APIs provide defined types which represent the preferred integer of the platform, to make it easier to write code for a platform that has both 32-bit and 64-bit APIs.

GeeYouEye
Jun 26, 2007, 03:21 PM
The most careful thing you can do is use header-based typedef'ed integers:

int32_t, uint32_t, int64_t, uint64_t, etc., assuming whatever you're coding will have the same libraries on all platforms. Otherwise, just watch out and use sizeof() liberally.

toddburch
Jun 26, 2007, 04:28 PM
I would have to say the size of an integer is predicated on the size of the CPU's registers. However, certainly, a compiler could define any length it wanted and subsequently leverage, or work-around, the actual hardware.

In the early days of PC's, registers were 2 bytes (16 bits), thus, a "word" was 2 bytes, and so was an integer.

As processor's evolved, and addressing did too, registers moved to 4 bytes long, and thus, it made sense to define an integer as 4 bytes.

With 64 bit processors, registers (on the mainframe, at least) are 8 bytes. However, we have not seen integers evolve to 8 bytes, and I'm guessing they will be 4 bytes for some time to come. On an IBM 64-bit box, for example, with 8 byte registers, if the 4-byte instructions are used (AKA 31-bit instructions), only the right half of an 8-byte register is used. No muss, no fuss.

Todd

Krevnik
Jun 26, 2007, 05:03 PM
With 64 bit processors, registers (on the mainframe, at least) are 8 bytes. However, we have not seen integers evolve to 8 bytes, and I'm guessing they will be 4 bytes for some time to come. On an IBM 64-bit box, for example, with 8 byte registers, if the 4-byte instructions are used (AKA 31-bit instructions), only the right half of an 8-byte register is used. No muss, no fuss.

Todd

This is how it works with the Core 2 Duo (x64) and the G5 (PPC64) as well. 32-bit mode only uses half of the 8-byte register available, but if you are running a 64-bit clean app (compiled for 64-bit), then your int will be 8 bytes, as will your pointers.

toddburch
Jun 26, 2007, 05:24 PM
This is how it works with the Core 2 Duo (x64) and the G5 (PPC64) as well. 32-bit mode only uses half of the 8-byte register available, but if you are running a 64-bit clean app (compiled for 64-bit), then your int will be 8 bytes, as will your pointers.

Yeah, same on the mainframe. It's a complier option to exploit full 64-bit or not. However, the terminology for an "integer" is still 4 bytes. If refering to 64-bit integers, we say "8-byte integers", or, "double-word integers". A word is still 4 bytes.

Todd

PS: Just as an FYI, I want to point out that I did not make a typo when I said "31-bit". PCs / Macs / other machines might be 32 bit, but IBM machines, up until lately, only addressed 31 bits. The high order bit (left-most bit) was used to indicate addressing mode, which could be 24-bit mode (<= 16MB) when off or 31-bit mode (<=2GB) when on. On the 64-bit machines, there are other mechanisms to set and query addressing mode.

(And, to be complete, yes, registers on a mainframe are certainly a full 32-bits, allowing for 32 bits of precision, but only 31 bits of memory can be addressed)

iSee
Jun 26, 2007, 05:54 PM
Here's what the C99 spec has to say: A ‘‘plain’’ int object has the natural size suggested by the architecture of the execution environment

Krevnik
Jun 26, 2007, 06:34 PM
Yeah, same on the mainframe. It's a complier option to exploit full 64-bit or not. However, the terminology for an "integer" is still 4 bytes. If refering to 64-bit integers, we say "8-byte integers", or, "double-word integers". A word is still 4 bytes.


This seems to speak to the mainframes you have been exposed to. Different architectures use different terminology. Windows still defines a word as 16-bit, and a 32-bit int is a double word. Mainframes I have worked on used 48-bit words (as you can imagine, anything using bitpacking and assuming 48-bit word boundaries are interesting to port to home architectures).

In C-speak, an int is an int and is the size of the CPU registers (as stated quite simply by iSee). char is a byte, short is a 16-bit integer, longs are 32-bits, and long longs are 64-bit. wchar is a 16-bit unsigned integer for UTF-16 support. Those definitions don't change just because the metal does. Those who do assembly work tend to be a bit more interested in the specific definitions of what a word is.

Sayer
Jun 26, 2007, 08:58 PM
platform bickering aside, the size of an integer is determined by the total number of possible values it may represent.

E.g. a byte, or 8 bits, can have one of 256 different possible values from 0-255 or 0x00 to 0xff in HEX. This would be an unsigned int, btw. Meaning it has only positive vales. A signed integer can be positive or negative, and this affects the range of possible values such that you can represent from -128 to +127.

A "short" int in old-school Mac parlance, typically two bytes or 16 bits and typed as UInt16 in the Mac now, has a max. (unsigned) value of 32,767 or 0xffff in hex.

A "long" is a 32 bit value, or four bytes, and has a max. (unsigned) value of 4,294,967,296 or 0xffffffff in hex. The type in Mac programming (carbon typ.) is UInt32.

A float is typ. 32 bits wide, a double is 64 bits I believe (may depend on implementation what the max. value is).

The PowerPC accesses memory in chunks sized as multiple of 4-bytes, so making memory structures aligned to four bytes is more efficient for the CPU. Example

struct myStruct {

UInt32 myVal; // 4 bytes
UInt16 halfVal;
UInt16 otherHalfVal; // two 16s = 32 bits, or 4 bytes
UInt16 someOtherHalfVal; // only 2 bytes
UInt16 fillerVal; // Fills up the left over 16 bits make this struct 4-byte
// aligned, also gives room for expansion later on

}

garethlewis2
Jun 27, 2007, 02:13 AM
For C, int is defined by your compiler. The compiler is compiled using a C environment and an int is defined using types.h.

This is why you cannot trust that an int is 32 or 64 bit. You must always, always using sizeof when mallocing memory. NSInteger in Leopard is just syntatic sugar that is globablly set in your project.

It doesn't matter what your registers hold on the CPU, it hasn't done for a good 20 years. When ANSI made C an actual certified language, the sizeof an int was set to 4 bytes. This is true for all C environments. Those that code on ****** compilers on embedded environments use a variant of C known as ****tard C.

techgeek
Jun 29, 2007, 08:08 AM
No need to get abusive.
Some of us have to make a living in theose embedded environments thank you very much!

A char is 8 bits
An int is at least 16 bits
These values can be greater, but not smaller.

From the C99 standard:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

Section 5.2.4.2.1 Sizes of integer types <limits.h>

"...implementation defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign."

— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

— minimum value for an object of type int
INT_MIN -32767 // −((2^15) − 1)
— maximum value for an object of type int
INT_MAX +32767 // (2^15) − 1
— maximum value for an object of type unsigned int
UINT_MAX 65535 // (2^16) − 1

gekko513
Jun 29, 2007, 08:52 AM
Using gcc:

int - 32 bit
long - 32 bit in a 32-bit environment, 64 bit in a 64-bit environment
long long - always 64 bit

Krevnik
Jun 29, 2007, 09:04 AM
Using gcc:

int - 32 bit
long - 32 bit in a 32-bit environment, 64 bit in a 64-bit environment
long long - always 64 bit

Scary thing is... you are right and showed us all we were mostly wrong. :)

To verify, I built a command-line app that was 32-bit and 64-bit that ran the following code:


#include <stdio.h>

int main (int argc, const char * argv[]) {

printf("Variable Sizes...\n");
printf("short: %d bytes\n", sizeof(short));
printf("long: %d bytes\n", sizeof(long));
printf("int: %d bytes\n", sizeof(int));
printf("long long: %d bytes\n", sizeof(long long));

return 0;
}

Results on 32-bit:


Variable Sizes...
short: 2 bytes
long: 4 bytes
int: 4 bytes
long long: 8 bytes

Results on 64-bit:


Variable Sizes...
short: 2 bytes
long: 8 bytes
int: 4 bytes
long long: 8 bytes

lazydog
Jun 29, 2007, 10:07 AM
Results on 32-bit:


Variable Sizes...
short: 2 bytes
long: 4 bytes
int: 4 bytes
long long: 8 bytes

Results on 64-bit:


Variable Sizes...
short: 2 bytes
long: 8 bytes
int: 4 bytes
long long: 8 bytes

There's something I don't quite understand. sizeof() returns a value of type size_t which, if I remember correctly, is defined as unsigned int. So I guess the results above are fine for 32bit architecture, but for 64 bit size_t wouldn't be big enough for all cases.

Out of interest perhaps you could print out the result of sizeof( size_t )?

thanks

b e n

Krevnik
Jun 29, 2007, 10:11 AM
There's something I don't quite understand. sizeof() returns a value of type size_t which, if I remember correctly, is defined as unsigned int. So I guess the results above are fine for 32bit architecture, but for 64 bit size_t wouldn't be big enough for all cases.

Out of interest perhaps you could print out the result of sizeof( size_t )?

thanks

b e n

size_t is 8 bytes. If you want the actual definition itself, it isn't unsigned int:

#if defined(__GNUC__) && defined(__SIZE_TYPE__)
typedef __SIZE_TYPE__ __darwin_size_t; /* sizeof() */
#else
typedef unsigned long __darwin_size_t; /* sizeof() */
#endif

And elsewhere there is a typedef of __darwin_size_t to size_t. So the compiler can define __SIZE_TYPE__ (or the programmer) to force it to a particular size, but otherwise it is an unsigned long.

lazydog
Jun 29, 2007, 10:48 AM
Krevnik

Thanks for taking the time to answer.

b e n

gekko513
Jun 29, 2007, 10:54 AM
Krevnik

Thanks for taking the time to answer.

b e n

Even if size_t was only 32 bit it would be more than big enough to hold the values returned by sizeof(long long). Since sizeof(long long) is 8, it only really needs a four bit unsigned value to store the answer (binary 1000).

lazydog
Jun 29, 2007, 11:02 AM
Well, as an example why size_t needs to be 64bit, you could have something like

int big_array[ 16000000000 ] ;

size_t size = sizeof( big_array )


if you had enough memory


b e n

gekko513
Jun 29, 2007, 11:36 AM
Well, as an example why size_t needs to be 64bit, you could have something like

int big_array[ 16000000000 ] ;
…
size_t size = sizeof( big_array )


if you had enough memory


b e n

Oh right. I misunderstood your question then.

MongoTheGeek
Jun 29, 2007, 12:08 PM
Well, as an example why size_t needs to be 64bit, you could have something like

int big_array[ 16000000000 ] ;
…
size_t size = sizeof( big_array )


if you had enough memory


b e n

You could cause some interesting breakages with that.

Krevnik
Jul 1, 2007, 10:11 AM
Even if size_t was only 32 bit it would be more than big enough to hold the values returned by sizeof(long long). Since sizeof(long long) is 8, it only really needs a four bit unsigned value to store the answer (binary 1000).

Even malloc() takes a size_t, so size_t needs to be the same bit size as your memory pointers.

ChrisA
Jul 1, 2007, 11:11 AM
There's something I don't quite understand. sizeof() returns a value of type size_t which, if I remember correctly, is defined as unsigned int. So I guess the results above are fine for 32bit architecture, but for 64 bit size_t wouldn't be big enough for all cases.

Out of interest perhaps you could print out the result of sizeof( size_t )?

thanks

b e n

"size_t" is the type that is used to hold the length of a object in memory, so by definition size_t is long enough to hold the length of a object in memory. Or to say it another way, it is impossible to define an object larger than what would fit in a variable of type size_t.