PDA

View Full Version : pointers - please explain




satans_banjo
Dec 16, 2005, 11:44 AM
okay - i've scoured the internet for an explanation i understand. what are pointers, why would i use them and is there anything else i need to know?



balamw
Dec 16, 2005, 11:59 AM
okay - i've scoured the internet for an explanation i understand. what are pointers, why would i use them and is there anything else i need to know?
Have a look at this: http://www.cprogramming.com/tutorial/lesson6.html

B

satans_banjo
Dec 16, 2005, 12:25 PM
yeah ive already looked at that. i'm wondering why anyone would use pointers? have you got any examples of applications for pointers (by applications i dont mean programs, i mean practical uses)

Fukui
Dec 16, 2005, 12:29 PM
okay - i've scoured the internet for an explanation i understand. what are pointers, why would i use them and is there anything else i need to know?
Try not to think about pointers as anything special, don't let the &blabla and *blabla syntax confuse you.

Think of a straight list of items (the memory in a computer), from 0 to 128.

Normal variables like an int or a float takes a certain amount of those 128 items in the list.

For example, an int takes 4 slots in the list. So somewhere in that list of 128 there is a group of 4 items that holds the value of the int. But where do you start at? We dont know, so a pointer points to the first (usually) slot in the list. So (int *myInteger) might mean, starting at slot 124 an int variable is held.

BUT, using * just points to its location, to use it it needs to be "dereferenced."

Also, when you create an int without a pointer, internally its acutally still pointing to the location 124, as in the example, but the reason you use int *myInt, is so that YOU can access the location in memory of the variable. You can also pass a pointer, and change the pointer (location in memory) without changing the actually value of the original.

A double pointer ** just means that this pointer points to a location in memory, and that location also points to another location, so you can just think of it as links in a web page, they just point to different places and at the end, is the first slot of the actuall item.

Its faster to pass a pointer (to a function) than to pass a variable because some things such as structures are large to copy, but a pointer is just 4 bytes long (in a 32 bit system), so passing the memory location is like saying (its located here) instead of passing the whole large thing to someone.

satans_banjo
Dec 16, 2005, 12:53 PM
so if i declare a pointer to an int, then printf the pointer, would that return the memory address of the int?

mrichmon
Dec 16, 2005, 01:10 PM
yeah ive already looked at that. i'm wondering why anyone would use pointers? have you got any examples of applications for pointers (by applications i dont mean programs, i mean practical uses)

Every application you use relies heavily on pointers.

A pointer (or memory reference) ultimately allows the programmer to accomplish one of two things:


dynamically allocate memory for data elements and operate over them where the number of elements is not known until application runtime.
build arbitrary memory data structures to represent and/or store some appliation-specific information.


An example of the first is in a Mail application. The developer does not know how many pieces of mail you will have in your inbox. So instead of building an array to hold the messages the developer builds up some form of list structure. Generally this means that the code only holds onto a pointer to the start of the list, then knows how to traverse the list, add elements to the list and remove elements from the list. The upper size limit for the list is now dependent on the amount of memory available.

An example of the second item is a List as roughly described above. But alternatively, you can think of a tree data structure. A simple binary tree consists of nodes that hold a data value and a "left" and a "right" pointer. Data is stored in the tree so that the data is sorted according to particular criteria. Once common set of criteria is that elements are stored so that elements with a "lower" value are stored in the left branches of the tree and "higher" values are stored to the right. This type of scheme allows for much faster access of individual data elements than can be performed with flat structures. A tree, though not a binary tree is used as the primary internal structure used to represent a web page during rendering of the page in a browser. Safari along with every other graphical web browser uses some form of tree while rendering a page.

If you are studying computer science then the next class after you learn how to use pointers is usually a class on different types of data structures and the trade-offs with each structure.

The key concept and often one of the most difficult for people to grasp is that a byte of memory in a computer just stores a number. That number can represent an integer, it can represent a character, it can represent the colour of a pixel, or it can be some other memory location since memory locations are numbered from 0 starting at the beginning of memory and increasing by one for each byte. A pointer is just a piece of memory that stores the address of another piece of memory.

satans_banjo
Dec 16, 2005, 01:20 PM
so i've gathered this so far:


int variable; //declares a variable
variable = 4; //declares the value of the variable
int *pointer; //declares the pointer
pointer = &variable //assigns the value of the pointer to the memory address of the variable
*pointer = 5; // gets the value of variable (pointed to by the pointer) and changes it to 5


so is that correct?

Fukui
Dec 16, 2005, 01:26 PM
so if i declare a pointer to an int, then printf the pointer, would that return the memory address of the int?
Yes, exactly.

mrichmon
Dec 16, 2005, 01:30 PM
so if i declare a pointer to an int, then printf the pointer, would that return the memory address of the int?

If you declare:


int a = 10;
int *b;


then
printf("%d\n", a) will output "10".
printf("%d\n", b) will output whatever the memory address of the variable "b".
printf("%d\n", *b) will output whatever value you have assigned to the int variable that b points to. (Assuming you have previously allocated the variable using malloc and assigned it a value.)
printf("%d\n", &a) will output the memory address of the variable "a". (The storage for variable a is automatically allocated by the compiler.)
printf("%d\n", &b) will output the memory address for the pointer variable "b". (This is the location where the value of the pointer is stored, not the location for the value of the integer that b points to.)

A good way to understand all of this is to write up a small program such as and play with the print statements to understand what is going on:


#include <stdio.h>
#include <stdlib.h>

main() {
int a = 10;
int *b;

b = (int *)malloc(sizeof(int)); // allocate memory for int pointed to by b

*b = 20; // set value of int pointed to by b

printf("value of 'a': %d\n", a);
printf("value of 'b': %d\n", b);
printf("value of '*b': %d\n", *b);
printf("value of '&a': %d\n", &a);
printf("value of '&b': %d\n", &b);
}


The output will be something like:


value of 'a': 10
value of 'b': 5243120
value of '*b': 20
value of '&a': -1073745000
value of '&b': -1073744996


In this run, the pointer "b" is allocated the memory address 5243120. This means that "20" is stored in the memory location 5243120. The pointer itself is stored in memory location -1073744996 (this is a negative number because it is relative to the layout of application memory which you don't need to worry about right now.)

So, in this code, variable "a" is stored in memory address "-1073745000" so the value of this memory location is "10". Variable "b" is stored in memory address "-1073744996" and has the value "5243120". The memory address "5243120" is the location pointed to by "b" and contains the value "20".

Fukui
Dec 16, 2005, 01:38 PM
so i've gathered this so far:


*pointer = 5; // gets the value of variable (pointed to by the pointer) and changes it to 5


Yes, as long as *pointer was declared as int *pointer.
So if its,

int *pointer;
*pointer = 5;

Then yes, but if its,

int *pointer;
pointer = 5;

then this just changes the memory address that its pointing to.

satans_banjo
Dec 16, 2005, 01:58 PM
ah okay. i've grasped the concept. i guess some practice will help me learn the syntax a bit better

thanks to everyone who posted. you know you're on a great forum when people can explain a difficult programming concept to someone like me (i dont do much programming) and so quickly too!:)

EDIT: one last question: out of curiosity, do any developers target a particular memory address for a specific variable? for example, would they change the value of &variable to make the program run more predictably?

Fukui
Dec 16, 2005, 02:22 PM
EDIT: one last question: out of curiosity, do any developers target a particular memory address for a specific variable? for example, would they change the value of &variable to make the program run more predictably?

They might pass a reference to the variable if the variable is too big to pass effeciently. For example, in Cocoa an object could be very big, so when you call a function say doSomethingWithObject(NSObject) it would pass a copy of the variable (the NSObject), which would be very costly if it were big (variables passed to a function are copied), but doSomethingWithObject(NSObject *) passes the reference (pointer), so its only copying 4 bytes on a 32-bit system, but if you copied the whole thing it might be 1 or 2 MB or more. Plus, you can't change the original thing passed to you if its copied, you have to pass it back again and waste memory.

Its like placing a link in a mail message instead of including the whole web-page in the mail, you can click it and go if you want, and the mail is smaller and faster to download.


EDIT: one last question: out of curiosity, do any developers target a particular memory address for a specific variable?

Sometimes, if variables are located next to eachother, it can be faster if you need to run inside the cache of a processor, but in general its an unnecessary hassle to worry about the particular address a variable is located and place it "there," a good OS and runtime will handle this transparently well.

satans_banjo
Dec 16, 2005, 02:50 PM
Its like placing a link in a mail message instead of including the whole web-page in the mail, you can click it and go if you want, and the mail is smaller and faster to download.

that's the best explanation i've seen. i'll remember that

mrichmon
Dec 16, 2005, 03:43 PM
Its like placing a link in a mail message instead of including the whole web-page in the mail, you can click it and go if you want, and the mail is smaller and faster to download.


Also, different languages handle passing arguments differently. Some languages default to passing a copy and you need to use a different syntax to pass a reference. Other languages only pass references so in this case you need to do some extra work as the programmer if you want pass by copy semantics.

Fukui
Dec 16, 2005, 05:08 PM
Also, different languages handle passing arguments differently. Some languages default to passing a copy and you need to use a different syntax to pass a reference. Other languages only pass references so in this case you need to do some extra work as the programmer if you want pass by copy semantics.
Yea, C# and Java pass by reference by default don't they.(I have limited exp.)

Do you know any OO languages that pass by copy by default?

jeremy.king
Dec 16, 2005, 05:12 PM
...Java pass by reference by default don't they...

With exception of primitives and String.

mrichmon
Dec 16, 2005, 05:29 PM
Yea, C# and Java pass by reference by default don't they.(I have limited exp.)

Do you know any OO languages that pass by copy by default?

Arguably C++ is pass by value as a default. But that is a bit misleading.

Java as you say is pass by reference for objects but pass by value for atomic types, except in the case of using Remote Method Invocation. Using RMI the semantics for passing objects and atomic types are pass by value.

C# is pure pass by reference.

Python is also pass by reference.

Ruby only allows pass by reference.

PHP uses pass by value.

Lisp when using CLOS I believe uses pass by value semantics if my memory serves.

Smalltalk uses pass by reference semantics.

Oberon, like Pascal has explicit pass by value and pass by reference syntax.

Object COBOL (yes such evil exists in the world) allows pass by reference, pass by value and pass by content semantics. Pass by content means that a pointer to a copy of the data item is passed. How's that for twisted? :)

Fukui
Dec 16, 2005, 08:06 PM
Arguably C++ is pass by value as a default. But that is a bit misleading.

Java as you say is pass by reference for objects but pass by value for atomic types, except in the case of using Remote Method Invocation. Using RMI the semantics for passing objects and atomic types are pass by value.

C# is pure pass by reference.

Python is also pass by reference.

Ruby only allows pass by reference.

PHP uses pass by value.

Lisp when using CLOS I believe uses pass by value semantics if my memory serves.

Smalltalk uses pass by reference semantics.

Oberon, like Pascal has explicit pass by value and pass by reference syntax.

Object COBOL (yes such evil exists in the world) allows pass by reference, pass by value and pass by content semantics. Pass by content means that a pointer to a copy of the data item is passed. How's that for twisted? :)
Wow, thats a lot of info.
Hmm, most pass by reference, thats pretty interesting...

Aarow
Dec 16, 2005, 08:51 PM
A pointer?

Fukui
Dec 16, 2005, 08:57 PM
A pointer?
Didn't your mother ever tell U its rude to point. ;)

mrichmon
Dec 16, 2005, 10:36 PM
Wow, thats a lot of info.
Hmm, most pass by reference, thats pretty interesting...

Most OO languages pass by reference. A key reason is that most OO languages are designed to eliminate pointers. There is a subtle distinction between a pointer and a reference... a pointer is a memory location and can be incremented and decremented whereas a reference is just a handle to an object. Under the covers a reference is often implemented using a memory address but does not have to be implemented that way. The key thing is that a programmer should not be able to modify a reference to get access to another object. But a programmer can for example increment a pointer to access the next memory block.

Pass by value was also mostly promoted as a way to avoid reliance on side-effects. Better program structuring theory and more focus on avoiding side-effects in programming courses has virtually eliminated the reliance on side-effects in modern code.

Fukui
Dec 17, 2005, 01:00 AM
Under the covers a reference is often implemented using a memory address but does not have to be implemented that way. The key thing is that a programmer should not be able to modify a reference to get access to another object. But a programmer can for example increment a pointer to access the next memory block.
Its interesting, I wonder if it would even be possible to implement a programming language and runtime in say C# that implements C#... IOW programming in C, one could make another C runtime, C compiler or other languages, so the C runtime is implemented in C, but could a Java runtime or C# runtime be implemented in those languages? If they hide pointers? Isnt it a kind of weakness?

I'm not sure there's actually a reason to get rid of pointers, though it sounds nice at first... since the basic design of every computer is to use a buffer of memory why hide it?

mrichmon
Dec 17, 2005, 05:21 AM
Its interesting, I wonder if it would even be possible to implement a programming language and runtime in say C# that implements C#... IOW programming in C, one could make another C runtime, C compiler or other languages, so the C runtime is implemented in C, but could a Java runtime or C# runtime be implemented in those languages? If they hide pointers? Isnt it a kind of weakness?

I'm not sure there's actually a reason to get rid of pointers, though it sounds nice at first... since the basic design of every computer is to use a buffer of memory why hide it?

Implementing a language runtime, or more commonly a compiler for a language since many languages are compiled directly to native machine code, in the language itself is common practice. For example, a C compiler is generally written in C. There is a well defined process for getting to the point where a compiler for a new language is implemented using the new language itself and you also have a compiled version of the compiler. The process is known as "bootstrapping".

Can a Java runtime be implemented in Java? Yes, there have been several projects to do just that. The fact that pointers are hidden in the langauge makes some aspects of the implementation difficult but it can be worked around. The more difficult issues in implementing a Java runtime in Java is the fact that there are no explicit mechanisms for the programmer to allocate/deallocate memory, no atomic locking mechanisms, nor ways to control OS level threads in Java. But these issues can also be worked around.

In some ways the omission of pointers from a language could be seen as a weakness. But what is really happening in these languages is that the language designer is removing the need for a programmer to explicitly manage memory allocation. Memory allocation and pointer manipulation is one of the biggest sources of bugs, complexity and inefficiency in code. So the argument is that if you remove memory allocation and pointers from the langauge and rely on an automatic memory management mechanism then there is less chance for dangerous and difficult to track down memory related bugs. In effect removing pointers makes the langauge easier to use for most tasks.

The trade off is that some types of programs are a little more difficult to write. In most cases the programs that are made more complex are things like runtimes and operating systems that very few programmers ever actually implement. If you have automatic memory collection, specifically garbage collection, then it is generally not possible to provide pointers in a language. A reason for this is that the garbage collector moves objects around in memory. With references, these object movements can be transparent since the reference does not refer to a specific location in memory. Pointers however refer to a specific location in memory so will break in the context of a garbage collector.

Another way to look at it is to say that we have a langauge which is perfectly designed for implementing runtimes and operating systems. That language is C. A language like Java doesn't have the same level of flexibility as C since Java is abstracted further away from the underlying machine than C is. However, Java is a much safer language in which programmers are generally more efficient in terms of the time it takes to produce working code to solve a problem.

The real question is whether every language needs to be able to easily solve every possible programming problem, or is it better to accept limitations in some langauges when those limitations are the result of providing features that would otherwise be impossible to provide in the language?

BTW, though C# is a reference-based garbage collected language it does allow the use of pointers by the programmer. The trick that C# uses is to allow the programmer to explicitly pin an object into its current memory location until the programmer unpins the object.

satans_banjo
Dec 17, 2005, 05:50 AM
the language i'm learning is C, but my main aim is to move on from C to ObjC/Cocoa

Fukui
Dec 17, 2005, 03:44 PM
In some ways the omission of pointers from a language could be seen as a weakness. But what is really happening in these languages is that the language designer is removing the need for a programmer to explicitly manage memory allocation. Memory allocation and pointer manipulation is one of the biggest sources of bugs, complexity and inefficiency in code.

Right, but I wonder there couldn't be a kind of compromise, instead of "Must Garbage Collect" or "Must Hide Pointers" is to provide a layered approach, like a base implemenation using functions, pointers, and no collection or bounds checking, then based on that build on an object layer, then add collection etc, then there wouldn't be any translation "layers" like JNI or the C# bridge...


If you have automatic memory collection, specifically garbage collection, then it is generally not possible to provide pointers in a language. A reason for this is that the garbage collector moves objects around in memory. With references, these object movements can be transparent since the reference does not refer to a specific location in memory. Pointers however refer to a specific location in memory so will break in the context of a garbage collector.

I didn't know garbage collectors manipulate pointers... I though they just keep references to the memory (pointer) and the null it once it had no references... but then again I guess thats why they hide the pointers, thats how they count references! I thought instead the runtime would check if the code had pointers to the location of an object or struct, then if all the pointers were nulled to that particular location, then it would be freed...

Its interesting, I wonder how they'll implement the garbage collector in Obj-C, probablly only objects could be collected, but then they don't hide the pointers...

Thanks for the info.

Fukui
Dec 17, 2005, 03:50 PM
the language i'm learning is C, but my main aim is to move on from C to ObjC/Cocoa
Yea, thats one of the thing I like about Obj-C, you learn C and thats a strength, you also learn OO principles along with it. That and I love the syntax and API, it somehow doesn't feel so much like programming, just looking at plain C or Java or J-Script, I find it feels too terse and "machine" like. Though C et all is very powerful.

mrichmon
Dec 17, 2005, 04:08 PM
Right, but I wonder there couldn't be a kind of compromise, instead of "Must Garbage Collect" or "Must Hide Pointers" is to provide a layered approach, like a base implemenation using functions, pointers, and no collection or bounds checking, then based on that build on an object layer, then add collection etc, then there wouldn't be any translation "layers" like JNI or the C# bridge...


I didn't know garbage collectors manipulate pointers... I though they just keep references to the memory (pointer) and the null it once it had no references... but then again I guess thats why they hide the pointers, thats how they count references! I thought instead the runtime would check if the code had pointers to the location of an object or struct, then if all the pointers were nulled to that particular location, then it would be freed...

Its interesting, I wonder how they'll implement the garbage collector in Obj-C, probablly only objects could be collected, but then they don't hide the pointers...

Thanks for the info.

The issue is that good garbage collectors move the object around in memory. What you need is for the reference to be independent of the location that the object is stored in memory. A pointer references a particular location in memory so if the object is moved then the pointer no longer points to the object.

It's not clear what precisely you mean by a layered approach. But in general you can do whatever you want when designing a programming language. The trick is working out how to implement it efficiently.

The GC used in Objective-C is a rather simple collector and has several serious flaws. The most obvious of which is that it relies on reference counting. With a reference counting collector, any loop of objects such as object A holding a reference to object B and object B holding a reference to object A will never be collected no matter whether no other objects hold a reference to A or B. This results in a memory leak.

mj_1903
Dec 17, 2005, 04:22 PM
The GC used in Objective-C is a rather simple collector and has several serious flaws. The most obvious of which is that it relies on reference counting. With a reference counting collector, any loop of objects such as object A holding a reference to object B and object B holding a reference to object A will never be collected no matter whether no other objects hold a reference to A or B. This results in a memory leak.

If you end up with a situation like that then it is time to either rewrite that block of code or move to autoreleased objects.

I would hate to think of an instance where two objects would retain references of each other... you are more likely going to end up with a crash than a memory leak.

Fukui
Dec 17, 2005, 04:25 PM
It's not clear what precisely you mean by a layered approach. But in general you can do whatever you want when designing a programming language. The trick is working out how to implement it efficiently.

Well, like in Java, everything is hidden, and now they're complicating the langiuage even more with 1.5, C# is adding lots of complicated syntax in 2.0 and 3.0, its just seems like they firstly try to eliminate alot of complexity, then later, they realize, oops we need to do XYZ but we can't becuase the base is too inflexible, so we gotta extend the language.

Why cant a language instead build on a more flexible (but not as safe) base (like C) then using C (the base implementation) add abstractions that make it simpler as one goes up along the levels. IOW, say I had Obj-C with a full garbage collector and a JIT and bounds checking etc, I could call code from C, or assembly or procedural code when I want, but no native interface non-sense etc, because its actually just all C code I'm using, the pre-processor, runtime handles it all for me..

JMHO but its seems kind of backwards to provide a simple and inflexible base then add complexity to it later (like Java and C#)...I guess I just "want it all":)

mrichmon
Dec 17, 2005, 08:06 PM
If you end up with a situation like that then it is time to either rewrite that block of code or move to autoreleased objects.

I would hate to think of an instance where two objects would retain references of each other... you are more likely going to end up with a crash than a memory leak.

That is far from true. A double linked list is a commonly used data structure and requires exactly that sort of structure. If you then release the head of the list the elements in the list will not be collected by a garbage collector that is based on reference counting. Instead you need at least a mark and sweep collector, or a generational collector.

How exactly do you think things will crash? By magic?

mrichmon
Dec 17, 2005, 08:23 PM
Well, like in Java, everything is hidden, and now they're complicating the langiuage even more with 1.5, C# is adding lots of complicated syntax in 2.0 and 3.0, its just seems like they firstly try to eliminate alot of complexity, then later, they realize, oops we need to do XYZ but we can't becuase the base is too inflexible, so we gotta extend the language.


The syntax of a language has very little relation with the semantics provided by the language. The trick with extending a language like Java is to do so in a way that will work with previously written code since there is a huge user base already.


Why cant a language instead build on a more flexible (but not as safe) base (like C) then using C (the base implementation) add abstractions that make it simpler as one goes up along the levels. IOW, say I had Obj-C with a full garbage collector and a JIT and bounds checking etc, I could call code from C, or assembly or procedural code when I want, but no native interface non-sense etc, because its actually just all C code I'm using, the pre-processor, runtime handles it all for me..


You could. Whether you would get the performance you want is an open question. Then there is the question of whether anyone else would want to use the language. There are literally thousands of langauges out there implemented by someone who wanted a programming language to work in a different way.

Also, getting Objective-C to have a full garbage collector is a major technical challange. Working out the appropriate semantics and behavior for a JIT for Objective-C is also a major technical challenge, particularly since Objective-C is not an interpreted language. If you are running C code then there is no runtime to be concerned with.

All of these things you are suggesting are major pieces of work, each of which a good grad student would need to study for 3 years or so. If they actually work out an implementation then the student would have done sufficient work to get a PhD.

But you are right, if you had Objective-C with a good GC. And if you redesigned some of the ugly semantics out of Objective-C. Then it is likely that you would be able to call native code directly. However at this point you would not have a langauge that produces binary code that is portable across any number of platforms like Java and C# is portable.


JMHO but its seems kind of backwards to provide a simple and inflexible base then add complexity to it later (like Java and C#)...I guess I just "want it all":)

That's not quite how it works but based on this short discussion I can see why you view it this way. In my previous posts I have given a highly simplified the explaination of certain language elements. In practice you cannot view these elements in isolation since many different aspects of a language interact and force certain other choices to be made and similarly prevent you from using certain techniques.

Ultimately langauge design is a complex field.

Fukui
Dec 17, 2005, 08:59 PM
You could. Whether you would get the performance you want is an open question. Then there is the question of whether anyone else would want to use the language. There are literally thousands of langauges out there implemented by someone who wanted a programming language to work in a different way.

I guess the idea would be that since its natively C based, it would be easy to integrate with existing code bases... programers could choose the level abstraction they wanted to code in, and since the syntax and base API are the same, then a lot of code can be copied and pasted from eachother... (though I guess with a good parser any java code could and classes could be copied into C# form pretty easily since they seem so similar...)

I remember once bill gates said "the difference between languages is largely just syntax," which was one justification for the CLR... its an interesting idea I wonder if it has some truth to it? Kind of sucks in a way if true, it makes it sound as if there's no difference anymore or forever....

Thanks for your detailed reply BTW.

HiRez
Dec 17, 2005, 11:25 PM
The GC used in Objective-C is a rather simple collector and has several serious flaws. The most obvious of which is that it relies on reference counting. With a reference counting collector, any loop of objects such as object A holding a reference to object B and object B holding a reference to object A will never be collected no matter whether no other objects hold a reference to A or B. This results in a memory leak.First of all, what Obj-C garbage collector are you talking about? Is it for some open-source implementation or something...surely Apple is not adding a GC to theirs are they? Anyway, couldn't you just make it so that if the GC detects such a reference loop while trying to delete an object the user has released, it nulls out the references on the other side of the object as well (IOW, the objects connected to that object with two-way pointers)? In the case of a double-linked list you'd still have to do some manual patching of references when deleting elements, but I would expect to have to do that myself anyway.

Fukui
Dec 17, 2005, 11:55 PM
First of all, what Obj-C garbage collector are you talking about? Is it for some open-source implementation or something...surely Apple is not adding a GC to theirs are they?
Yep, check

- (void)finalize
(http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ObjC_classic/Classes/NSObject.html#//apple_ref/occ/instm/NSObject/finalize)
This is not yet implemented in Tiger, but its comming.

HiRez
Dec 18, 2005, 01:37 AM
- (void)finalize
(http://developer.apple.com/documentation/Cocoa/Reference/Foundation/ObjC_classic/Classes/NSObject.html#//apple_ref/occ/instm/NSObject/finalize)
This is not yet implemented in Tiger, but its comming.Wow. :eek: How did I miss that!? Intriguing, yet scary!

GeeYouEye
Dec 18, 2005, 10:35 AM
That is far from true. A double linked list is a commonly used data structure and requires exactly that sort of structure. If you then release the head of the list the elements in the list will not be collected by a garbage collector that is based on reference counting. Instead you need at least a mark and sweep collector, or a generational collector.

How exactly do you think things will crash? By magic?

But then you never see "NSDoublyLinkedList" in any framework; the right tools for the right job, structs in this case.

And the currently implemented reference counting scheme is not a garbage collector. The runtime calls -dealloc on any object with a retain count of 0. But that's it. -dealloc can be implemented however you want it to be. -dealloc doesn't have to deallocate the object. It could paint a pretty picture on the screen, and NSLog "screw you, I'm not going to be deallocated" to the console. The only guarantee of anything being actually deallocated is the [super dealloc] message that's almost always at the end of the method. But it doesn't have to be.

The ability to override dealloc is definitely a good thing, especially if you're working with expensive singletons (anyone on the Cocoa-dev list? there was a HUGE thread about this recently).

Then there's the whole autorelease pool thing... Personally, I find that has just as much flexibility in memory management as a GC, if not more.

A garbage collector would be very difficult to implement in Objective-C, because of the language's dependence on C. and it's impure Object-Orientation (how many classes store numbers as ints and floats rather than NSNumbers?). Objective-C has a lot of power because one can bridge the gap between interface (OO) and implementation (procedural C) without any special constructs, but this means there's a lot of pitfalls. I imagine either the Obj-C GC will either only work on objects, or will be quite an accomplishment, a universal C garbage collector. Given the presence of -finalize, I'm going to guess the former.

devman
Dec 18, 2005, 10:51 AM
C-compatible garbage collector info. http://www.iecc.com/gclist/GC-faq.html Great Circle by geodesic was one that I knew of for C++. You can still find reviews of the product if you google it. Here's one

Great Circle 2.1

Until recently, Great Circle was positioned as a garbage-collection utility for C and C++ programmers. It still does that, but Geodesic Systems has found that while most C programmers understand that they need to fix memory leaks (which Great Circle does), they may not understand how garbage collection can help.

Consider a C program with dynamic memory allocations-malloc calls to allocate memory, and free calls to release that memory. Normally if you malloc without a later free, the memory leaks. If you free a block twice, the memory heap becomes corrupt. Once you link it into your program, Great Circle lets you dispense with free calls. Instead, it scavenges memory in the background and periodically releases dead memory blocks-blocks for which no valid pointer is in scope. Great Circle can also report what it's doing.

Effectively, Great Circle acts as both a diagnostic and an immediate cure for memory leaks. Since it can fix memory leaks coming from third-party libraries and DLLs, as well as leaks coming from your own code, it can sometimes prove indispensable. And if you rely on Great Circle from the beginning of a project, instead of constantly trying to match your memory allocations, you can save an enormous amount of programming time.

Great Circle doesn't really compete directly with BoundsChecker-it's more of a complementary tool. It detects only a few out of the thousands of possible problems in Windows programs, while BoundsChecker detects essentially all of them. On the other hand, BoundsChecker does not actually fix anything, while Great Circle almost magically turns a program that leaks like a sieve into something solid and stable without requiring you to change a line of code.

All just FYI. (note: geodesic are no longer around. it was about 6-7 years ago I saw a team using great circle)

Fukui
Dec 18, 2005, 02:29 PM
Wow. :eek: How did I miss that!? Intriguing, yet scary!
Yea, and I'm wondering why we need to implement finalize if a garbage collector is gonna take care of dealoc-ing everything anyways... which leads me to think its just collecting objects, structs and c arrays etc are gonna have to be freed manually... just guessing.

HiRez
Dec 18, 2005, 04:35 PM
which leads me to think its just collecting objects, structs and c arrays etc are gonna have to be freed manually... just guessing.Probably so, but that would be ok with me, I rarely find myself actually using C arrays or structs when I can use NSArray or a custom NSObject subclass...IMO better to keep everything in "Objectville" unless there's a compelling reason not to. I do tend to use a lot of floats and ints instead of NSNumbers, however since those don't need to be allocated manually, I don't know why it's be a problem to have them garbage-collected. I wonder if Apple could just automatically store a programmer's primitive scalar types as NSNumber objects behind the scenes anyway, thus allowing them to be directly stored into NS collection classes. I rarely use structs except for NSPoint, NSRange, NSSize, etc, which are structs but could, and I think should, just be implemented as first-class objects anyway.

HiRez
Dec 18, 2005, 04:57 PM
Yea, and I'm wondering why we need to implement finalize if a garbage collector is gonna take care of dealoc-ing everything anyways...I think -finalize: gives you a chance to do your own clean-up before your object is trashed, for example to close a file or net connection that your object opened, or to remove the object from an NSNotificationCenter. IOW, all the same stuff you would do in -dealloc: except that you wouldn't necessarily have to null out references that your object alone creates and accesses (although it's probably a good idea to do so anyway. I remember when I was using Java, I got into the habit of nulling out references I was done using manually, even though it had automatic garbage collection. This ensured the object would be collected ASAP and didn't leave a dangling pointer somewhere.

Catfish_Man
Dec 18, 2005, 08:46 PM
I think -finalize: gives you a chance to do your own clean-up before your object is trashed, for example to close a file or net connection that your object opened, or to remove the object from an NSNotificationCenter. IOW, all the same stuff you would do in -dealloc: except that you wouldn't necessarily have to null out references that your object alone creates and accesses (although it's probably a good idea to do so anyway. I remember when I was using Java, I got into the habit of nulling out references I was done using manually, even though it had automatic garbage collection. This ensured the object would be collected ASAP and didn't leave a dangling pointer somewhere.

I'm a little unclear on why the gc can't simply call -dealloc... release/retain/autorelease will be overridden to be empty methods when gc is turned on, so if it weren't for the dealloc/finalize difference, than it seems like you could simple enable gc on an existing program and have it work. I'd be interested in hearing the logic behind this decision.

mrichmon
Dec 18, 2005, 09:39 PM
I remember when I was using Java, I got into the habit of nulling out references I was done using manually, even though it had automatic garbage collection. This ensured the object would be collected ASAP and didn't leave a dangling pointer somewhere.

By definition it is not possible to leave a dangling pointer in Java. This is because Java does not use pointers, rather Java uses "references". A internally a reference consists of more than just the memory location for an object.

The way you wind up with a dangling pointer is by deallocating the memory without voiding the pointer to that memory. In Java the programmer cannot explicitly deallocate memory so it is impossible for a reference to exist without the object the reference refers to also existing.

Fukui
Dec 18, 2005, 09:41 PM
I think -finalize: gives you a chance to do your own clean-up before your object is trashed, for example to close a file or net connection that your object opened, or to remove the object from an NSNotificationCenter.
That makes more sense, yea, it'll be interesting to see what they do.
I hope they allow us to still controll the collection or turn it off for some objects like [object garbageCollect:NO]. I'll be sooooooo happy if/when the garbage collector nils out invalid pointers... no more messages to "dead objects" and crashes. I can just check "if (someobject)" and always expect if there's an object gone, I can re-instantiate it, or catch a bug without crashing most of the time.

mrichmon
Dec 18, 2005, 09:48 PM
I'm a little unclear on why the gc can't simply call -dealloc... release/retain/autorelease will be overridden to be empty methods when gc is turned on, so if it weren't for the dealloc/finalize difference, than it seems like you could simple enable gc on an existing program and have it work. I'd be interested in hearing the logic behind this decision.

The way a GC would deallocate the memory is simply by calling dealloc() after ensuring any release/finalize methods are invoked. The problem is how to implement a GC in the context of a language that uses pointers.

Efficient garbage collectors move objects around in memory such that live objects end up in one part of memory, garbage ends up in another part of memory. The garbage area is deallocated in one block. In a langauge that uses pointers, moving objects means that you need to find all the pointers to a given object then move the object, then update the pointers. Then repeat this for each object you are interested in. This is an expensive proposition. It is also difficult to identify values that are used as pointer addresses since any integer type can be used to store a memory address. (ie long foo = *some_pointer; another_pointer = foo).

References remove this problem since it is generally not possible to assign an address to a reference in order to access a different object.

Catfish_Man
Dec 20, 2005, 01:54 AM
<snipped>

Didn't really answer my question. Garbage collectors for C based languages are nothing new, I was just wondering about why they had to make a new objective-c method that does exactly the same thing as -(void)dealloc (specifically, release any resources owned by the object).

devman
Dec 20, 2005, 08:43 AM
Didn't really answer my question. Garbage collectors for C based languages are nothing new, I was just wondering about why they had to make a new objective-c method that does exactly the same thing as -(void)dealloc (specifically, release any resources owned by the object).

speculation_mode=on.

Because if they are adding GC then people won't be calling dealloc. In GC environments finalize is called by the GC when there are no more references to the object. Thus it is the hook you need to release any resources you may be holding, or to undo any external state stuff you may have done during construction or during the object's life.

So, it could be prepping for GC.

speculation_mode=off.

Catfish_Man
Dec 22, 2005, 04:29 PM
speculation_mode=on.

Because if they are adding GC then people won't be calling dealloc. In GC environments finalize is called by the GC when there are no more references to the object. Thus it is the hook you need to release any resources you may be holding, or to undo any external state stuff you may have done during construction or during the object's life.

So, it could be prepping for GC.

speculation_mode=off.

Well, it explicitly says in the documentation that it's called by the GC. However, dealloc is never called by programmers anyway, it's called by the retain counting system, which can be thought of as a primitive form of GC.

devman
Dec 26, 2005, 08:29 AM
Well, it explicitly says in the documentation that it's called by the GC. However, dealloc is never called by programmers anyway, it's called by the retain counting system, which can be thought of as a primitive form of GC.

Ok, excluding your definition of GC (which includes the current retain system):

If they're adding what most people consider GC to be, then retain will become a no-op.

Catfish_Man
Dec 26, 2005, 07:34 PM
Ok, excluding your definition of GC (which includes the current retain system):

If they're adding what most people consider GC to be, then retain will become a no-op.

release/retain/autorelease will be overridden to be empty methods when gc is turned on

I know this. I know (roughly speaking) how it will work, I've read the emails to the gcc list, and the gnustep list, and the compiler documentation. I've looked into creating GC'd languages, and read about many others. My question is not about gc. It's about why they made a new name for dealloc instead of using the old one.

devman
Dec 27, 2005, 12:29 AM
I know this. I know (roughly speaking) how it will work, I've read the emails to the gcc list, and the gnustep list, and the compiler documentation. I've looked into creating GC'd languages, and read about many others. My question is not about gc. It's about why they made a new name for dealloc instead of using the old one.

oh, ok. Well we have to speculate again. I can think of two reasons.

If you were starting fresh and it was to be a GC environment what would you call the plugin point for people to "do whatever they have to do" before an object is GCed? dealloc is hardly a good name for this.

Also, Java is a huge influence here. It's Java that put GC environments back into the mainstream.

mrichmon
Dec 27, 2005, 01:25 AM
If you were starting fresh and it was to be a GC environment what would you call the plugin point for people to "do whatever they have to do" before an object is GCed? dealloc is hardly a good name for this.

What you are describing is commonly called "finalizing" an object. GC semantics will guarantee that the GC calls that the appropriate finalization routine before the GC deallocates the object. In Java the finalization method is finalize(). Technically in a GC-based system the programmer should not be explicitly dealloc'ing any object -- I guess this is the point of your question.

Java's finalize() method is implemented in java.lang.Object and is over-ridden by developers who need to perform finalization activity. When implementing your own finalize() method it is vital that the method does not result in the allocation of any new objects. For example no String contatenation, otherwise you can end up in some nasty race conditions.

devman
Dec 27, 2005, 09:28 AM
What you are describing is commonly called "finalizing" an object. GC semantics will guarantee that the GC calls that the appropriate finalization routine before the GC deallocates the object. In Java the finalization method is finalize(). Technically in a GC-based system the programmer should not be explicitly dealloc'ing any object -- I guess this is the point of your question.

Java's finalize() method is implemented in java.lang.Object and is over-ridden by developers who need to perform finalization activity. When implementing your own finalize() method it is vital that the method does not result in the allocation of any new objects. For example no String contatenation, otherwise you can end up in some nasty race conditions.

uh, you're talking to the wrong person. catfishman asked the question. You're making the same point I am (with more detail) and that is that in a GC environment a programmer isn't going to be dealloc'ing and hence dealloc is a bad name for the plugin point for finalization activities. Read the last 4 or 5 replies in this thread.