how the size of an integer is decided?

Discussion in 'Mac Programming' started by celia, Jun 25, 2007.

  1. macrumors newbie

    Joined:
    Jun 24, 2007
    #1
    Hi,

    how the size of an integer is decided?
    - is it based on processor or compiler or OS?


    Thanks.
     
  2. macrumors 603

    gauchogolfer

    Joined:
    Jan 28, 2005
    Location:
    American Riviera
    #2
    Do you want to know what determines the maximum size of permitted integers (i.e. 32-bit versus 64-bit)?
     
  3. macrumors 6502

    Joined:
    Dec 6, 2006
    #3
    For Java, it is built into the runtime. That is always 32-bit.

    For OS X and Windows, if memory serves me correctly, it is defined in types.h which specifies how much memory is allocated for each type. It was done in terms of char which was defined as

    #define char 1

    So an int used to be defined as

    #define int char*4

    Probably no longer determined in terms of char, as what is a char? It is 8 bits, 16 bits, or is it UTF8 which can be any number of bits. w_tchar is 16 bits, but isn't set in stone.
     
  4. thread starter macrumors newbie

    Joined:
    Jun 24, 2007
    #4
    is it based on the 32/64 bit processor or implementation dependant?

    Code:
    #define char 1
    
    So an int used to be defined as
    
    #define int char*4
    So it is not depending on the processor(32 / 64 bit)?
     
  5. macrumors regular

    Joined:
    Jul 5, 2005
    Location:
    London, UK
    #5
    Yes and no. C is/was designed to be a portable language so you could take your programs and run them on another computer easily. Hence it will make the size of an int whatever you tell it, however if you set an int to be 64 bit on a 32 bit machine then the computer has to break the the 64 bit word down into two 32 bit words, perform the operation on both parts sequentially, then stich it back together which is a *lot* slower than just doing it on a 32 bit integer.

    Hence, usually, the size of an int is set to whatever the size of the processor ALU is, however this is purely a performance choice

    Chris
     
  6. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #6
    Uhm, this is a bit strange, since int is defined as a language keyword in C. It is defined by the compiler, not headers. Hence why you could (in theory) write an app with no headers if all it did was return a code based on some arguments/etc. (since main is supposed to return an int, but can return void)

    The compiler defines int as the ALU size (processor bit-size), and I believe if you read K&R C and the ANSI C spec, this is the intended design. You /can/ override it with custom compilers, but then your compiler doesn't adhere to either C standard if you do.

    In a C program, if you need to know the size of an int, you can use the sizeof(int) expression to do it (if for some reason you run on multiple architectures). Most platform APIs provide defined types which represent the preferred integer of the platform, to make it easier to write code for a platform that has both 32-bit and 64-bit APIs.
     
  7. macrumors 68000

    GeeYouEye

    Joined:
    Dec 9, 2001
    Location:
    State of Denial
    #7
    The most careful thing you can do is use header-based typedef'ed integers:

    int32_t, uint32_t, int64_t, uint64_t, etc., assuming whatever you're coding will have the same libraries on all platforms. Otherwise, just watch out and use sizeof() liberally.
     
  8. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #8
    I would have to say the size of an integer is predicated on the size of the CPU's registers. However, certainly, a compiler could define any length it wanted and subsequently leverage, or work-around, the actual hardware.

    In the early days of PC's, registers were 2 bytes (16 bits), thus, a "word" was 2 bytes, and so was an integer.

    As processor's evolved, and addressing did too, registers moved to 4 bytes long, and thus, it made sense to define an integer as 4 bytes.

    With 64 bit processors, registers (on the mainframe, at least) are 8 bytes. However, we have not seen integers evolve to 8 bytes, and I'm guessing they will be 4 bytes for some time to come. On an IBM 64-bit box, for example, with 8 byte registers, if the 4-byte instructions are used (AKA 31-bit instructions), only the right half of an 8-byte register is used. No muss, no fuss.

    Todd
     
  9. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #9
    This is how it works with the Core 2 Duo (x64) and the G5 (PPC64) as well. 32-bit mode only uses half of the 8-byte register available, but if you are running a 64-bit clean app (compiled for 64-bit), then your int will be 8 bytes, as will your pointers.
     
  10. macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #10
    Yeah, same on the mainframe. It's a complier option to exploit full 64-bit or not. However, the terminology for an "integer" is still 4 bytes. If refering to 64-bit integers, we say "8-byte integers", or, "double-word integers". A word is still 4 bytes.

    Todd

    PS: Just as an FYI, I want to point out that I did not make a typo when I said "31-bit". PCs / Macs / other machines might be 32 bit, but IBM machines, up until lately, only addressed 31 bits. The high order bit (left-most bit) was used to indicate addressing mode, which could be 24-bit mode (<= 16MB) when off or 31-bit mode (<=2GB) when on. On the 64-bit machines, there are other mechanisms to set and query addressing mode.

    (And, to be complete, yes, registers on a mainframe are certainly a full 32-bits, allowing for 32 bits of precision, but only 31 bits of memory can be addressed)
     
  11. macrumors 68040

    iSee

    Joined:
    Oct 25, 2004
    #11
    Here's what the C99 spec has to say:
     
  12. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #12
    This seems to speak to the mainframes you have been exposed to. Different architectures use different terminology. Windows still defines a word as 16-bit, and a 32-bit int is a double word. Mainframes I have worked on used 48-bit words (as you can imagine, anything using bitpacking and assuming 48-bit word boundaries are interesting to port to home architectures).

    In C-speak, an int is an int and is the size of the CPU registers (as stated quite simply by iSee). char is a byte, short is a 16-bit integer, longs are 32-bits, and long longs are 64-bit. wchar is a 16-bit unsigned integer for UTF-16 support. Those definitions don't change just because the metal does. Those who do assembly work tend to be a bit more interested in the specific definitions of what a word is.
     
  13. macrumors 6502a

    Sayer

    Joined:
    Jan 4, 2002
    Location:
    Austin, TX
    #13
    platform bickering aside, the size of an integer is determined by the total number of possible values it may represent.

    E.g. a byte, or 8 bits, can have one of 256 different possible values from 0-255 or 0x00 to 0xff in HEX. This would be an unsigned int, btw. Meaning it has only positive vales. A signed integer can be positive or negative, and this affects the range of possible values such that you can represent from -128 to +127.

    A "short" int in old-school Mac parlance, typically two bytes or 16 bits and typed as UInt16 in the Mac now, has a max. (unsigned) value of 32,767 or 0xffff in hex.

    A "long" is a 32 bit value, or four bytes, and has a max. (unsigned) value of 4,294,967,296 or 0xffffffff in hex. The type in Mac programming (carbon typ.) is UInt32.

    A float is typ. 32 bits wide, a double is 64 bits I believe (may depend on implementation what the max. value is).

    The PowerPC accesses memory in chunks sized as multiple of 4-bytes, so making memory structures aligned to four bytes is more efficient for the CPU. Example

    Code:
    struct myStruct {
    
      UInt32   myVal; // 4 bytes
      UInt16   halfVal;
      UInt16   otherHalfVal; // two 16s = 32 bits, or 4 bytes
      UInt16   someOtherHalfVal; // only 2 bytes
      UInt16   fillerVal; // Fills up the left over 16 bits make this struct 4-byte
                          // aligned, also gives room for expansion later on
    
    }
     
  14. macrumors 6502

    Joined:
    Dec 6, 2006
    #14
    For C, int is defined by your compiler. The compiler is compiled using a C environment and an int is defined using types.h.

    This is why you cannot trust that an int is 32 or 64 bit. You must always, always using sizeof when mallocing memory. NSInteger in Leopard is just syntatic sugar that is globablly set in your project.

    It doesn't matter what your registers hold on the CPU, it hasn't done for a good 20 years. When ANSI made C an actual certified language, the sizeof an int was set to 4 bytes. This is true for all C environments. Those that code on ****** compilers on embedded environments use a variant of C known as ****tard C.
     
  15. macrumors member

    Joined:
    Jun 11, 2004
    Location:
    UK
    #15
    No need to get abusive.
    Some of us have to make a living in theose embedded environments thank you very much!

    A char is 8 bits
    An int is at least 16 bits
    These values can be greater, but not smaller.

    From the C99 standard:

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

    Section 5.2.4.2.1 Sizes of integer types <limits.h>

    "...implementation defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign."

    — number of bits for smallest object that is not a bit-field (byte)
    CHAR_BIT 8

    — minimum value for an object of type int
    INT_MIN -32767 // −((2^15) − 1)
    — maximum value for an object of type int
    INT_MAX +32767 // (2^15) − 1
    — maximum value for an object of type unsigned int
    UINT_MAX 65535 // (2^16) − 1
     
  16. macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #16
    Using gcc:

    int - 32 bit
    long - 32 bit in a 32-bit environment, 64 bit in a 64-bit environment
    long long - always 64 bit
     
  17. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #17
    Scary thing is... you are right and showed us all we were mostly wrong. :)

    To verify, I built a command-line app that was 32-bit and 64-bit that ran the following code:

    Code:
    #include <stdio.h>
    
    int main (int argc, const char * argv[]) {
    
    	printf("Variable Sizes...\n");
    	printf("short: %d bytes\n", sizeof(short));
    	printf("long: %d bytes\n", sizeof(long));
    	printf("int: %d bytes\n", sizeof(int));
    	printf("long long: %d bytes\n", sizeof(long long));
    
        return 0;
    }
    Results on 32-bit:

    Code:
    Variable Sizes...
    short: 2 bytes
    long: 4 bytes
    int: 4 bytes
    long long: 8 bytes
    Results on 64-bit:

    Code:
    Variable Sizes...
    short: 2 bytes
    long: 8 bytes
    int: 4 bytes
    long long: 8 bytes
     
  18. macrumors 6502a

    Joined:
    Sep 3, 2005
    Location:
    Cramlington, UK
    #18
    There's something I don't quite understand. sizeof() returns a value of type size_t which, if I remember correctly, is defined as unsigned int. So I guess the results above are fine for 32bit architecture, but for 64 bit size_t wouldn't be big enough for all cases.

    Out of interest perhaps you could print out the result of sizeof( size_t )?

    thanks

    b e n
     
  19. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #19
    size_t is 8 bytes. If you want the actual definition itself, it isn't unsigned int:

    Code:
    #if defined(__GNUC__) && defined(__SIZE_TYPE__)
    typedef __SIZE_TYPE__		__darwin_size_t;	/* sizeof() */
    #else
    typedef unsigned long		__darwin_size_t;	/* sizeof() */
    #endif
    And elsewhere there is a typedef of __darwin_size_t to size_t. So the compiler can define __SIZE_TYPE__ (or the programmer) to force it to a particular size, but otherwise it is an unsigned long.
     
  20. macrumors 6502a

    Joined:
    Sep 3, 2005
    Location:
    Cramlington, UK
    #20
    Krevnik

    Thanks for taking the time to answer.

    b e n
     
  21. macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #21
    Even if size_t was only 32 bit it would be more than big enough to hold the values returned by sizeof(long long). Since sizeof(long long) is 8, it only really needs a four bit unsigned value to store the answer (binary 1000).
     
  22. macrumors 6502a

    Joined:
    Sep 3, 2005
    Location:
    Cramlington, UK
    #22
    Well, as an example why size_t needs to be 64bit, you could have something like

    Code:
    int big_array[ 16000000000 ] ;
    …
    size_t size = sizeof( big_array )
    
    if you had enough memory


    b e n
     
  23. macrumors 603

    gekko513

    Joined:
    Oct 16, 2003
    #23
    Oh right. I misunderstood your question then.
     
  24. macrumors 68040

    MongoTheGeek

    Joined:
    Sep 13, 2003
    Location:
    Its not so much where you are as when you are.
    #24
    You could cause some interesting breakages with that.
     
  25. macrumors 68020

    Krevnik

    Joined:
    Sep 8, 2003
    #25
    Even malloc() takes a size_t, so size_t needs to be the same bit size as your memory pointers.
     

Share This Page