PDA

View Full Version : **Very** basic question




mdeh
Mar 1, 2009, 01:16 PM
Nearing the end of Kochan's book, and finally getting the hang of memory mangement of Obj C.

Which got me thinking.

In learning C, learnt how to make use of malloc/calloc/free etc.

But things like;

int i;

or int myArr[20]

or struct myStruct{ }; were never released, or allocated.

How is that memory managed?

thanks and apologize in advance.



WhiteRabbit
Mar 1, 2009, 01:23 PM
These non-dynamic variables are managed by the application and are automatically allocated and released at their declaration and end of scope.

lee1210
Mar 1, 2009, 01:37 PM
While it is implementation specific, these items normally live on the stack. In most implementations of C i know of, each function gets its own stack frame (except when it doesn't, i.e. inline). When a new stack frame is pushed onto the stack, all of the items declared have space set aside for them, and their offsets from the top of the stack frame are predetermined. When execution of the function completes, the stack frame is "popped", and the memory it was using is available for a new stack frame (likely of a different size) to be pushed on. Pointers that are locally-declared also live on the stack, but when you malloc, etc. they end up pointing *to* something on the heap.

As an aside, in an earlier thread you asked about NSMakeRange, which returns a struct (and not a pointer to one, either). This disturbed me at first, because I didn't know where the memory for this was coming from, if it was going to be released, if the same memory was used in every NSMakeRange invocation, etc. After a bit of reading I found that this function is inline, and will simply use some space on the stack, so when your function is finished, the memory goes away with the rest of the frame.

-Lee

mdeh
Mar 1, 2009, 01:49 PM
While it is implementation specific, these items normally live on the stack. .......When execution of the function completes, the stack frame is "popped", and the memory it was using is available for a new stack frame (likely of a different size) to be pushed on. Pointers that are locally-declared also live on the stack, but when you malloc, etc. they end up pointing *to* something on the heap.-Lee

Thank you Lee.
So, there seem to be 2 ways to allocate memory...stack vs heap. And, from what I have just "googled" :-), heap seems to be used for dynamic memory allocation, which is what I have been learning with Obj C.

So, to push you a little, as you seem to have the broad overview of understanding that I can only wish to achieve!!!....what are the very **very** broad pros and cons of each. If this is really extensive, please just ignore, but if there is a nice conceptual nugget here, I would love to hear it.

mdeh
Mar 1, 2009, 01:49 PM
These non-dynamic variables are managed by the application and are automatically allocated and released at their declaration and end of scope.
thank you

lee1210
Mar 1, 2009, 02:27 PM
Thank you Lee.
So, there seem to be 2 ways to allocate memory...stack vs heap. And, from what I have just "googled" :-), heap seems to be used for dynamic memory allocation, which is what I have been learning with Obj C.

So, to push you a little, as you seem to have the broad overview of understanding that I can only wish to achieve!!!....what are the very **very** broad pros and cons of each. If this is really extensive, please just ignore, but if there is a nice conceptual nugget here, I would love to hear it.

The stack is generally limited in size. This varies per implementation, system, etc. but the main thing i try to differentiate by is size and scope. If you need access to something outside of a local function, it should be on the heap so references to it stay good. An exception would be passing a reference to an item in the current function to a lower function which isn't going to result in this reference being stored. In this case, you know your reference to something on the stack will stay good, since your stack frame will still be on the stack during the execution of the function being called. Otherwise, if many functions will need to get at some variable, it will need to be on the heap. That way a pointer to this area of memory will stay good no matter what happens on the stack (as long as it's not prematurely free'd, etc.).

Size is the other one I mentioned. It's best to keep stack frames small, so you can have more of them. If you make no function calls, and only have a main() (i don't know what sort of thing that could possibly be...), using a lot of the stack isn't a big problem. But if you are going to be making (potentially deep) function calls, you're "sharing" the total space of the stack with all of those functions you'll be calling down into. Not all at once, mind you, but whatever your call tree looks like at any one time will be what's on the stack, sharing the memory available to it. For example:

void x();
void y();
void z();
void w();
int main(int argc,char *argv) {
int a;
char b[100000];
x();
}
void x() {
int a[100000];
y();
}
void y() {
double a[100000];
z();
}
void z() {
void *a[100000];
w();
}
void w() {
struct mybigstruct a[100000];
//... Does some stuff
}


When you called x from main, your call tree at the deepest would look like:
main
x
y
z
w

Each of these functions would have a stack frame on the stack. There's some extra overhead with each frame, but if we just sum the local data in here, saying that sizeof(mybigstruct) is 64 bytes, we get:
main 4+1*100000 = 400004 bytes
x 100000*4 = 400000 bytes
y 100000*8 = 800000 bytes
z sizeof(void *) * 100000 = platform dependent, but we'll say 800000 bytes
w 100000*64=6400000 bytes

so total we've got 8800004 bytes. If you had an 8MB stack size, you'd overflow it when you hit w, and bad things will happen. Obviously this is a contrived example, but the point is that you should be vigilant about the size of the data you keep locally. A few dozen pointers and other primitives isn't going to hurt much, but a large list, especially of structures, can be devastating.

One area that can really be costly in terms of stack space is the use of recursive algorithms. Recursion can be a great way to solve a problem, but if you aren't careful about your stack it can be costly. Consider this example of factorial in C:

#include <stdio.h>
long int factorial(int);
int main(int argc, char *argv[]) {
int baseval = 0;
long int result = 0L;
if(argc < 2) return -1;
baseval = atoi(argv[1]);
result = factorial(baseval);
printf("%d! = %ld\n",baseval,result);
}

long int factorial(int start) {
if(start <= 0) return 0;
if(start == 1) return 1;
return start*factorial(start-1);
}


The user gets to enter a value at the command line. This means they get to determine our call depth, because we're making calls N deep where N is the entered value. In this case, factorial is pretty trim. One 4-byte argument, an 8-byte return value... not too bad. It can probably last for quite a large entered value, but it will hit the stack ceiling eventually. If you added a large array for some reason to factorial, the values you could calculate would be greatly reduced. Obviously this algorithm doesn't need to be performed recursively, and can be done easily in a loop, but the purpose was for demonstration. There are languages (lisp, haskell, others I'm sure) that can do this sort of tail-recursion in constant stack-space. I just checked and had an example that was done in haskell to sum n + n-1 + n-2 + ... + 1:

deepcall :: [Integer] -> Integer -> Integer
deepcall [] acc = acc
deepcall (x:xs) acc = deepcall xs $! (x + acc)

makedeep :: Integer -> Integer
makedeep n = deepcall [1..n] 0


It's not totally straight forward, but it will run in constant stack space for "any" value. I tried it up to some very large numbers and it didn't have a problem. The $! business was to prevent lazy-evaluation. That's a long story that doesn't need to be told now, but the point is that due to the stack-frame method used in any C implementation i have encountered (which most systems lend themselves to at the microarchitectural level) it isn't always the best choice for a deep-recursing algorithm.

I don't think I've said anything about pros or cons... maybe... i guess... =)
Basically, declaring stack-local variables is handy. You don't need to allocate them, you don't need ANOTHER stack-local pointer variable to point to them, they are easy to refer to, and you can "forget" them, because they'll go away as soon as your function call ends. There is the limitations discussed above, with limited stack space, though. For heap variables, you have much more space, but you have to be more careful with what you allocate, the pointers to them, keeping track of the pointer to free the space, etc. With Objective-C you get off a little easier, because your Objects are all on the heap, and there is garbage collection in 2.0, and the retain/release/autorelease system in Cocoa otherwise that helps with managing this memory. I haven't had to use malloc in Objective-C, but I haven't done that much work with it, and I'm sure there are reasons sometimes. Basically the tradeoff, in my mind, is space available vs. the complexity of managing in. You can malloc a GB of memory, that's great, but you better remember it, because that's a hell of a leak if you forget.

-Lee

mdeh
Mar 1, 2009, 02:48 PM
The stack is generally limited in size. This varies per implementation, system, etc. but the main thing i try to differentiate by is size and scope. If you need access to something outside of a local function, it should be on the heap so references to it stay good.. ........
Size is the other one I mentioned. It's best to keep stack frames small, so you can have more of them. If you make no function calls, and only have a main() (i don't know what sort of thing that could possibly be...), using a lot of the stack isn't a big problem. But if you are going to be making (potentially deep) function calls, you're "sharing" the total space of the stack with all of those functions you'll be calling down into. Not all at once, mind you, but whatever your call tree looks like at any one time will be what's on the stack, sharing the memory available to it.

That is more than I could have asked for! The example clarifies this too.



I don't think I've said anything about pros or cons... maybe... i guess... =)
Basically, declaring stack-local variables is handy. You don't need to allocate them, you don't need ANOTHER stack-local pointer variable to point to them, they are easy to refer to, and you can "forget" them, because they'll go away as soon as your function call ends. There is the limitations discussed above, with limited stack space, though. For heap variables, you have much more space, but you have to be more careful with what you allocate, the pointers to them, keeping track of the pointer to free the space, etc. With Objective-C you get off a little easier, because your Objects are all on the heap, and there is garbage collection in 2.0, and the retain/release/autorelease system in Cocoa otherwise that helps with managing this memory.

Lee...as always...thank you very much. This is a true ( scary) eye opener. I had never even considered the issue of stack vs heap. Thank you for summarizing it so well.
Michael.

Flynnstone
Mar 1, 2009, 04:05 PM
Just to add a bit more info ...

/* filefoo.c */

int Prog_x; /* program scope, allocated at start of program
essentially on heap */
static int Prog_y; /* same as previous, but scope is only this file */

int main() {

int main_x; /* program scope, allocated on stack */
static int main_y; /* program scope, allocated on heap at start */

foofunc();
}

int foofunc( int x )
{
int foo_x; /* function scope, allocated on stack */
static int foo_y /* function scope, allocated on heap at start
variable survives after function exit */
}



I hope I've helped and not confused. We can get into "volatile" and "const".

mdeh
Mar 1, 2009, 04:47 PM
The stack is generally limited in size.

Sorry for the short answer earlier, but my wife wanted to go for a walk before it started raining...which it did, in any case. But the advantage of long walks is that I have a ready audience to try out the concepts I do not fully understand!!! :D
But seriously, I did have a couple of insights. Firstly, in learning something totally new, there are a number of key concepts. Very often, from my perspective at least, one is often not sure it is what one does not understand. So this thread is a perfect example, where what I had asked initially, asked about something which has often puzzled me. When this lead to heaps and stacks, it lead again to the very issue of why use one or the other. So, for me, this little "insignificant" thread has really opened my eyes to a very basic fundamental of understanding of the difference between something like C and Obj-C...which I would like to run by you.

It seems to me that the stack is essentially used for "proceedural" operations..ie push 4, followed by 4, add and pop 8 or 16 or whatever the operation is deemed to be. It would make no sense to store a string this way, ( well I hope so) that you would need to arbitrarily access. Contrast this with a heap, in which data is stored by address and **can** be arbitrarily accessed. Now, I supposed that a language like Obj-C does use the stack from time to time, but only in the constraint of some procedural endeavor.

Well, I hope this makes some kind of sense, as it has been exciting to discover something new.

mdeh
Mar 1, 2009, 04:58 PM
Just to add a bit more info ...

/* filefoo.c */

int Prog_x; /* program scope, allocated at start of program
essentially on heap */
static int Prog_y; /* same as previous, but scope is only this file */

int main() {

int main_x; /* program scope, allocated on stack */
static int main_y; /* program scope, allocated on heap at start */

foofunc();
}

int foofunc( int x )
{
int foo_x; /* function scope, allocated on stack */
static int foo_y /* function scope, allocated on heap at start
variable survives after function exit */
}



I hope I've helped and not confused. We can get into "volatile" and "const".

Thank you...it all helps.

lee1210
Mar 2, 2009, 09:50 AM
It seems to me that the stack is essentially used for "proceedural" operations..ie push 4, followed by 4, add and pop 8 or 16 or whatever the operation is deemed to be. It would make no sense to store a string this way, ( well I hope so) that you would need to arbitrarily access. Contrast this with a heap, in which data is stored by address and **can** be arbitrarily accessed. Now, I supposed that a language like Obj-C does use the stack from time to time, but only in the constraint of some procedural endeavor.

Well, I hope this makes some kind of sense, as it has been exciting to discover something new.

This is a little off. What you are discussing is a means of writing a simple stack-based calculator with RPN data entry, which is an interesting task, but not exactly the way stack frames work during program execution. Times like these I wish I had some sort of digitizing whiteboard, because a few drawings would help describe this quite a bit. I'll try to make do with words and perhaps a few ascii illustrations.

Some of this is implementation-specific, so I'll try to speak in somewhat broad terms... a stack frame will generally contain a return address to the caller (where execution will begin once the current function/subroutine completes), the arguments passed to the subroutine, the local data for the subroutine, and normally some space to store the return value of the function if there is one. There are a few additional addresses involved that may be stored in a special register if the microarchitecture supports it. One is the base of the stack. This is generally unimportant to the current subroutine. The other is the current stack pointer, which is the position in memory where the current stack frame begins. The stack pointer for the previous stack frame must also be kept and restored when the current stack frame is popped.

Where arguments are placed in memory, where local storage is kept in memory, etc. are all based on an offset from the stack pointer, rather than an absolute position in memory. This means that when a function is called, a particular variable will always be located 32 bytes past the stack pointer. This way, no matter where the stack frame is in memory, a simple calculation can be used to find the address of this variable.

When a new subroutine is called, the stack pointer will be set just after the current stack frame (perhaps a little further for alignment purposes based on the machine), the arguments will be stored in the appropriate pre-determined positions offset from the stack pointer, the previous stack pointer and next address will be stored in a position offset from the new stack pointer, and the new subroutine can begin execution. Once it has finished, all of this is "popped", insofar as the stack pointer is set back to the proper position at the beginning of the previous stack frame. This process repeats at the end of each subroutine's execution, restoring the state to the previous stack frame.

In your post you mentioned storing a string in terms of a list of characters. In the case of a C-style string, you can certainly store the array of characters on the stack. There's simply a fixed number of bytes equaling the length of the character array set aside, with the base of the array starting at a fixed offset from the stack pointer. The "type" of data really isn't too important when it comes to laying out a stack frame, just the size.

It may not be of particular interest at this point, but depending on the machine, alignment in memory may be important. This might mean that there's some unused bytes on the stack that are simply padding. So:

void mySub() {
char a;
int b;
char c;
short d;
double e;
char f;
void *g;
//...
}

We'll assume (even if it's not always the case) that the compiler wants these variables laid out in the order the programmer declared them, and the machine requires no alignment for a char (1-byte), 2-byte alignment for a short (2-bytes), 4-byte alignment for int,float(4-byte), and 8 byte alignment for double, and *s(8-byte, assuming pointers are 8-byte on this machine). So, things might be laid out like this:
sp+0 = char a (uses 1 byte, next free is sp+1)
sp+4 = int b (needs 4-byte alignment, uses 4 bytes, next free is sp+8)
sp+8 = char c (next free is sp+9)
sp+10 = short d (needs 2-byte alignment, uses 2 bytes, next free is sp+13)
sp+16 = double e (needs 8-byte alignment, uses 8 bytes, next free is sp+24)
sp+24 = char f (next free is sp+25)
sp+32 = void *g (needs 8-byte alignment, uses 8 bytes, next free is sp+40)

So we have 25 bytes worth of data, but we used up 40 bytes of space to store them so they are aligned. If we were really worried about stack space, and the order the variables were laid out didn't matter, we could always arrange things from the items with the smallest size to items of the largest size, which would leave a lot less space for padding. In this case we'd have 3 chars, and need just 1 byte for padding before the short, 2 bytes of padding after the short before the int, the 4 bytes of padding before the double and void *. This would be only 7 bytes of padding instead of 15, and in some cases. This is possible with singletons, but if you have a structure that needs an absolute layout in memory, this kind of optimization is not possible. The programmer might optimize for less padding, but the compiler cannot.

All of that alignment business was a bit off topic, but at the point we're talking about how the stack frames are being laid out in memory I thought it might be worth mentioning.

-Lee

mdeh
Mar 3, 2009, 11:05 AM
This is a little off.

You are being kind!
But, your description of the workings of the stack has given me a nice peg onto which to hang the original question, so I thank you for that.
Michael.