macOS * Confusion in C

ArtOfWarfare · Jan 7, 2013

To figure out pointers, I thought of the * in C as having three proper usages:

1 - It may be used for multiplication, like here:

Code:

product = factor1 * factor2;

Or here:

Code:

runningProduct *= factor;

2 - It may be used to declare a pointer type variable, like these:

Code:

char* charPtr;
int* intPtr;
float* floatPtr;
double* doublePtr;
void* voidPtr;

Or here:

Code:

typedef struct {
    int a;
    int b;
} MyStruct;

MyStruct* myStructPtr;

3 - It may be used to dereference pointers, like here:

Code:

*intPtr = 3; // Sets the value at the address intPtr points to to 3.
int localInt = *intPtr; // localInt now holds the value at the address intPtr point to - which we know to be 3 in this example.

To make it quite clear which of the three ways I'm using the * character, I have a different way of putting spaces around it in each case.

When using it the first way, for multiplication, there is a space before and after it, " * ".

When using it the second way, to declare a pointer type, I put a space after but not before it, "* ".

When using it the third way, to dereference a pointer variable, I put a space before but not after it, " *".

Am I correct so far in thinking of the * as being used in three different ways? I believe I am, as I've been writing functional code with all that preceded for the last 2 years now.

I've decided that I'd like to try my hand at making a C IDE for iOS, partially because I'm not content with them, and partially because I suspect that in doing so, I'll master the C language.

So, having said that, I've run into this funky line in C...

This line declares a pointer, named ptr, to an array of characters.

Code:

char(* ptr)[];

I'm not sure what to make of the parenthesizes, though. They're separating the * from both char and [], making it seem like my pointer variable is declared as neither a char nor a [], if that makes any sense.

This couldn't be rewritten as:

Code:

char* ptr[];

Because that would make an array of char pointers rather than a single pointer to a char array.

If I try to write it as:

Code:

char(char* ptr)[];

Except, well, no, because that doesn't compile.

While I'm on this topic... why are brackets places after the identifier of the variable instead of after the type? When I write something like

Code:

int* ptr;

Can be easily read as "an int pointer named ptr"... note that the symbols/keywords come in the same order as the nature English.

But if I want an array like this:

Code:

int arr[];

it'd be most naturally read in English as "an int array named arr", but if I construe my English to have the same order as the symbols/keywords, I end up with "an int variable named arr which is actually an array".

I don't care about the array complaint as much as the pointer confusion... which actually confuses me.

Edit:

Somehow, I completely forgot about the rabbit hole that is C function pointers:
http://stackoverflow.com/questions/13247473/figuring-out-c-declarations

Here's an explanation of how to read them aloud:
http://c-faq.com/decl/spiral.anderson.html

So... now I'm feeling a bit more lost than before...

I'm going to go sleep on this now, I guess...

chown33 · Jan 7, 2013

You don't declare a pointer as "pointer to array of T", where T is some type. That's because there is no syntactical difference between "pointer to array of T" and "pointer to singular T". Here's why:

Every C pointer is implicitly a pointer to an array.

Every pointer can be subscripted, like so:

Code:

char * p = somethingThatReturnsPtrToChar;

char a = p[0];
char b = p[1];
char wtf = p[-1];

The wtf variable holds the char before the address where p is currently pointing. If p happens to point to the middle of a string, then p[-1] is the char before p[0].

This may all seem strange, but that's because you're thinking of C arrays as different from pointers. The only difference between them lies in their definitions, not their types. Here's another strange truth:

Every array name is a constant pointer to the first array element.

For example, int bb[ 10 ]; the type of bb by itself is pointer to int. So every place you have an array subscript, think of the array name as being a constant pointer, and voila, pointer subscripting is identical to array subscripting.

A pointer variable can have operators that change it (like ++), and can be assigned a new value. A pointer constant cannot. So if you think of the array name bb by itself, with no subscript, you can do all things with bb that you can do with a variable pointer, except those things that modify the pointer itself. You can subscript it, you can add expressions as long as the result is stored in a different pointer, you can even use pointer arithmetic, like so:

Code:

int bb[ 10 ];

bb[ 0 ] = 98;
bb[ 1 ] = 2;
bb[ 2 ] = *bb + *(bb + 1);
*(bb + 3) = bb[0] + bb[1];

Think about what this code does before reading on. What values would you have in bb[0] thru bb[3] after the last statement has executed?

The unary * operator (pointer dereference) is exactly equivalent to array subscripting with an index value of 0. And dereferencing via pointer arithmetic is identical to subscripting:

Code:

int * p = somethingThatReturnsPtrToInt;

int a = p[0];  int aa = *p;    // exactly equivalent
int b = p[1];  int bb = *(p + 1);    // exactly equivalent
int wtf = p[-1];  int wwtf = *(p - 1);    // exactly equivalent

You should definitely study C pointers, but you need to do so as exactly what it is: studying C pointers. Once you understand the equivalences between pointers and arrays, you will really have an understanding of C arrays.

You should also study C type declarations and definitions, because how types are defined is different from how operators are used.

The book "The C Programming Language" by K&R is the definitive book for pointers and type definitions, in my view, but some people need more examples and more detailed explanations.

Finally, I strongly advise against this style:

Code:

char* charPtr;

It makes it look like * is binding more tightly to char, but it's really not (read a C reference for type-declaration binding precedence and associativity). For example:

Code:

char* cc, dd;

This declares cc to have type ptr-to-char and dd to have type char (NOT ptr-to-char). To get both variables to have type ptr-to-char, you need this:

Code:

char* cc, * dd;

What does this define:

Code:

char* cc, ee[ 10 ];

Is ee an array of char, or array of ptr-to-char, or something else? What type is cc?

ArtOfWarfare · Jan 7, 2013

Thank you for the large amount of samples, Chown. The weirder valid code I can find, the better (because ultimately my app needs to accept all of it without complaining, while still pointing out invalid code.)

kage207 · Jan 7, 2013

Also keep in mind that an array in C is actually a list of pointers. A pointer is nothing more than a memory address which may contain one or many pointers (as you can ask for a pointer to point to a block of memory, hence an array).

lloyddean · Jan 7, 2013

kage207 said:
Also keep in mind that an array in C is actually a list of pointers.

Ah, do you wish to either rephrase or restate this?

ytk · Jan 7, 2013

Interestingly, an array reference in C is simply a macro. That is, "a" is simply converted to "*(a + b)". The net result of this is that "foo[10]" is an equivalent statement to (the also syntactically valid) "10[foo]". Try it out if you don't believe me.

lloyddean · Jan 7, 2013

Now that ..., I believe!

whooleytoo · Jan 8, 2013

ytk said:
Interestingly, an array reference in C is simply a macro. That is, "a" is simply converted to "*(a + b)". The net result of this is that "foo[10]" is an equivalent statement to (the also syntactically valid) "10[foo]". Try it out if you don't believe me.

Interesting!

The highlighted bit is only true for an array of byte-sized entries I presume, or else the statement above is simplified slightly.

bearda · Jan 8, 2013

chown33 said:
You don't declare a pointer as "pointer to array of T", where T is some type. That's because there is no syntactical difference between "pointer to array of T" and "pointer to singular T". Here's why:

Every C pointer is implicitly a pointer to an array.

<SNIP>

Although I agree with the intent of what you said, there i some terminology there I don't really agree with. I think it's more accurate to say "In C every array is implicitly a pointer" and "In C every pointer is implicitly a pointer to a member of an array". You can declare a pointer to an array of T which has a distinctive type when compared to an array of T.

----------

whooleytoo said:
Interesting!

The highlighted bit is only true for an array of byte-sized entries I presume, or else the statement above is simplified slightly.

It's actually true in all cases. When you add to a pointer the actual pointer valid increases by the size of the base type. If I have a pointer to a 32 bit number and increment it by 1 (p++) the actual value of p increases by 4. This makes iterating through memory with a pointer a lot easier.

chown33 · Jan 8, 2013

bearda said:
Although I agree with the intent of what you said, there i some terminology there I don't really agree with. I think it's more accurate to say "In C every array is implicitly a pointer" and "In C every pointer is implicitly a pointer to a member of an array".

I believe my original statement was accurate. There are no constraints in C for subscripting pointers (or arrays). If there were, then C would not have a problem with pointers that move past the end of an array and overrun the array's bounds. Yet that is exactly what happens with pointers or subscripts, absent some other level of bounds-checking: memory beyond the actual bounds of the array is read or written.

I'm not saying it's correct to use every pointer as a pointer to an array. That's the difference between explicit array pointers and implicit ones. If C had explicit array pointers, then array bounds would be part of the pointer type, and it would be impossible to move the pointer outside the array. But C doesn't do that. Instead, it lets you point any pointer anywhere, regardless of whether it's to a scalar value or to any array, and there is no way to express a distinction in the language. Thus, every C pointer is implicitly a pointer to an array.

You can declare a pointer to an array of T which has a distinctive type when compared to an array of T.

Please show an example.

Sydde · Jan 8, 2013

chown33 said:
Every C pointer is implicitly a pointer to an array.

There is fertile ground for confusion here. When I declare

Code:

NSString *aString;

it needs to be obvious that aString is not a pointer to an array of NSString objects and one should never try to use an object pointer as a pointer to an array. I think you might be able to get away with it in C++, but any time you "get away with" something, you should re-evaluate your design.

I suppose it is lamentable that NeXT did not decide to define object references as opaque types in order to reduce confusion.

firewood · Jan 8, 2013

Sydde said:
There is fertile ground for confusion here. When I declare

Code:

NSString *aString;

it needs to be obvious that aString is not a pointer to an array of NSString objects...

Actually, in C, it is... although, in Objective C, ARC would probably complain if you tried to assign a C array of object pointers (such as NSString objects) to aString. But with ARC disabled, you could assign a C array of retained NSString objects to it. Legal, but stylistically very poor.

chown33 · Jan 8, 2013

Sydde said:
There is fertile ground for confusion here. When I declare

Code:

NSString *aString;

it needs to be obvious that aString is not a pointer to an array of NSString objects and one should never try to use an object pointer as a pointer to an array. I think you might be able to get away with it in C++, but any time you "get away with" something, you should re-evaluate your design.

I was not referring to Objective-C or C++ pointers, only C pointers. I did this because the OP was specifically referring only to C (please see original post).

An Objective-C object pointer is not implicitly a pointer to an array of objects, because Objective-C does not allow non-pointer objects. There's no way to deference a pointer to an object and get the "naked" object. The Objective-C compiler rejects such attempts (or should).

I suppose it is lamentable that NeXT did not decide to define object references as opaque types in order to reduce confusion.

I understand and sympathize with this, from a language design viewpoint, but I also understand the engineering tradeoffs that might counter it, especially given the time it was invented.

FWIW, NeXT didn't invent Objective-C, so they really didn't get much choice about what constitutes an object reference:
http://en.wikipedia.org/wiki/Objective-C#History

Objective-C was created primarily by Brad Cox and Tom Love in the early 1980s at their company Stepstone.
...
In 1988, NeXT licensed Objective-C from StepStone ...

As is often the case with C-derived languages, first attempts are typically pre-processors of some kind, and that limits the extent to which the underlying language syntax can be altered.

bearda · Jan 8, 2013

Your nomenclature seems to be slightly skewed from the norm.

For most developers an array is a variable that is declared. The declared keyword's value is a constant pointer, and that pointer is backed by a region of memory. A pointer is a memory address and a type enforced at compile time.

A pointer to an array of integers is a pointer to a pointer (int**). This can not be implicitly cast into an array of integers (int array[]), or into a pointer to a single integer (int* pointer).

lloyddean · Jan 8, 2013

A pointer is simply a location in memory of sufficient size to hold the address of another location in memory.

Sydde · Jan 8, 2013

bearda said:
Your nomenclature seems to be slightly skewed from the norm.

For most developers an array is a variable that is declared. The declared keyword's value is a constant pointer, and that pointer is backed by a region of memory. A pointer is a memory address and a type enforced at compile time.

A pointer to an array of integers is a pointer to a pointer (int**). This can not be implicitly cast into an array of integers (int array[]), or into a pointer to a single integer (int* pointer).

Not sure what you mean by normal nomenclature. In C, "array" is used to mean a contiguous regular sequence of data elements indexed in ascending order in memory with no implicit padding. A traditional C string, for instance, is literally an array of elements of type char. Whatever "most developers" call an array does not alter traditional C nomenclature. This is born out by the fact that if you have a pointer to a thing, C assumes that you can index it. And the indexing is based on what the pointer says it is pointing to (or how you cast its target type).

Yes, it may be preferable to use indirection in arrays, but actual C nomenclature does not call those "proper arrays" vs. OIDK, "dirty dangerous arrays". An ordered sequence of elements is an array, be it comprised of pointers, bytes or structs.

jon3543 · Jan 8, 2013

To the OP: You've received several partially correct and blatantly wrong replies. People have tried to talk about the equivalence between arrays and pointers, and as is often the case, they've gotten it wrong. You cited the C FAQ; read chapter 6, "Arrays and Pointers" for the truth about that, and I suggest everyone who's posted here do the same. In particular, the claim that "arrays are constant pointers" should be ignored, because it is completely wrong. Hopefully everyone will go read the FAQ instead of trying to argue about this now, but long experience means I'm not optimistic about that.

ArtOfWarfare said:
So, having said that, I've run into this funky line in C...

This line declares a pointer, named ptr, to an array of characters.

Code:

char(* ptr)[];

Actually, it declares a pointer to an array of char of unspecified size, which is an incomplete type. Moreover, it's an incomplete type that apparently can never be completed, and you really can't use it for much of anything. For example:

Code:

char (*p)[];

void f()
{
   p[1][0];
}

This will give a compile error, because the size of the array p is declared to point to is unknown, and the compiler cannot even in principle compute where p[1] begins, as it doesn't know the size of p[0]. Where'd you come across this strange thing? Usually, pointers to arrays are used like this:

Code:

#define M 10
#define N 20
char b[M][N];
char (*p)[N] = b;

This works due to the standard array-to-pointer conversion converting b into a pointer to its first element, which has the type char (*)[N], where N is a compile-time constant. (Note that M doesn't matter in the declaration of p.) Now p can be used very similarly to b, except it can be made to point to any such array. Note, however, that b is not a "constant pointer". There is no pointer anywhere in or around b.

Code:
I'm not sure what to make of the parenthesizes, though. They're separating the * from both char and [], making it seem like my pointer variable is declared as neither a char nor a [], if that makes any sense.

This couldn't be rewritten as:

Code:

char* ptr[];

Because that would make an array of char pointers rather than a single pointer to a char array.

The meaning of that one depends on whether or not it's a function parameter. As a local or global variable, ptr would be an array of unspecified size, an incomplete type. As a function parameter, it would be a pointer due to the most evil thing in the C type system, the ability to use brackets in the first dimension interchangeably with the *, e.g. all these declare the same function:

Code:

void f(char** p);
void f(char* p[]);
void f(char* p[2]);
void f(char* p[200]);

Only the first one reflects the underlying reality, that p is a pointer. Note that this is true only for function parameters, which is where many people get the dead wrong idea that "arrays are pointers". You can easily prove they are all the same in C++ by giving any two of them bodies and trying to compile, which will fail due to the attempted redefinition. In C, which doesn't support function overloading, you could probably just add an additional declaration:

Code:

void f(int);

That should cause the compiler to at least emit a warning, which it would have done without it, unless all the original four f's were the same. (Aha!)

Again, it is beyond evil for the array syntax to work like this in this one context. It can take a long time to undo the damage it does to one's mind.

Elsewhere, your incomplete type means something completely different. For example:

Code:

char* b[];

int main()
{
   b[0] = 0;
}

This will compile in VC++ 2013 but die with a linker error, because b has incomplete type and therefore does not define an object. It can be fixed by completing the type in a subsequent declaration or other translation unit, e.g. the former:

Code:

char* b[];
char* b[10];

int main()
{
   b[0] = 0;
}

Now it will compile and link fine. Note that b is an array type, not a pointer type.

In general, incomplete types are most used in forward declarations, for example, declaring pointers to types that cannot be completed until sometime later:

Code:

struct S;

struct T
{
   struct S* s; // S is incomplete here
};

struct S
{
   struct T t; // T has to be complete for this to work.
};

It's sort of a chicken and egg situation, because T cannot contain an S, but it can contain a pointer to an S by using a forward declaration of S.

Code:
If I try to write it as:

Code:

char(char* ptr)[];

Except, well, no, because that doesn't compile.

While I'm on this topic... why are brackets places after the identifier of the variable instead of after the type? When I write something like

Code:

int* ptr;

Can be easily read as "an int pointer named ptr"... note that the symbols/keywords come in the same order as the nature English.

But if I want an array like this:

Code:

int arr[];

it'd be most naturally read in English as "an int array named arr", but if I construe my English to have the same order as the symbols/keywords, I end up with "an int variable named arr which is actually an array".

A major goal of the C type system was for declaration to mimic usage. The result of achieving that goal has historically been a source of major confusion and complaints. You are by far not the first to make such observations.

I don't care about the array complaint as much as the pointer confusion... which actually confuses me.

Edit:

Somehow, I completely forgot about the rabbit hole that is C function pointers:
http://stackoverflow.com/questions/13247473/figuring-out-c-declarations

Here's an explanation of how to read them aloud:
http://c-faq.com/decl/spiral.anderson.html

So... now I'm feeling a bit more lost than before...

I'm going to go sleep on this now, I guess...

I repeat, everyone in this thread who spoke about arrays and pointers being the same thing needs to go read Chapter 6 in the C FAQ, which is all about their differences, which are profound. If I were to sum it up and ignore the evil syntax stuff I touched upon earlier and many other things for brevity, I would say:

Arrays are not pointers. They don't contain pointers. For example, there is not a single pointer anywhere in or around int[10], int[10][20], int[10][20][30], etc. An array is an area of storage containing elements of a single type, one after the other, with no space in between them. There is a standard conversion employed in most contexts called the array-to-pointer conversion that produces a pointer to an array's first element given the array's name. It is from this conversion that the pointer-like behavior of arrays arises. For example, when you say b[2] for some array b, it is converted to *(b+2) per the array indexing rules, and b undergoes the array-to-pointer conversion, producing a pointer to b's first element. While similar to saying p[2], where p is an actual pointer pointing to b's first element, the difference is, the compiler had to fetch the contents of the pointer p to compute the address p+2, whereas it "knows" where the array b starts in memory, so there is no runtime fetching of a pointer value to compute the address b+2. Here are a couple of examples of array types and the types of their first elements:

Code:

int a1[10];
int* p1 = a1;
int a2[10][20];
int (*p2)[20] = a2;

In each pointer initialization, the array undergoes the array-to-pointer conversion, producing a pointer to its first element. In particular, note that the following is wrong:

Code:

int** p2 = a2; // Bad!

That's because a2 is an array, not a pointer. That said, int** can be used to simulate a 2D array, and it is a typical low-level implementation of fully dynamic 2D arrays, where the bounds aren't known at compile time. But if I continue like this, I will pretty much be writing my own C FAQ, and since that has already been done, please go read the real one.

Sydde · Jan 9, 2013

The confusion arises from the fact that the syntax for accessing a primitive array can be the same whether the array is statically allocated or an arbitrary location designated by a pointer. For a pointer, the term *ptr is exactly equivalent to ptr[0], which is the same phrasing one uses with a statically allocated array. Similarly, if you have an indirect pointer (e.g., "int **indirectPtr"), you can use the index phrasing in place of the indirection operator: "**indirectPtr" is exactly equivalent to "indirectPtr[0][0]". In a very real sense, any variable label can be thought of as a sort of symbolic pointer because it designates a location in memory (and its location can be obtained with the "&" operator).

What the OP should learn from this confusion is the value of using typedef s. Most serious programmers do not use constructions that are difficult to comprehend because they want code that is easy to maintain.

jon3543 · Jan 9, 2013

Sydde said:
What the OP should learn from this confusion is the value of using typedef s. Most serious programmers do not use constructions that are difficult to comprehend because they want code that is easy to maintain.

Typedefs are just synonyms for types, and they don't relieve you from understanding how the type system works. Knowing about typedefs doesn't help the OP understand the strange incomplete types he asked about, nor do they help him understand the context-dependent meaning of the second one he gave, which are some of the things I talked about in my last message. In fact, typedefs can introduce their own confusion. For example, everyone trips up on this:

Code:

typedef char* CharPtr;
// Now define a pointer to const char using CharPtr. Most people will try:
const CharPtr p; // Wrong!
// The above is equivalent to CharPtr const p; and thus char* const p;
// There's no way to do do it. You have to define another typedef
// basically from scratch:
typedef const char* ConstCharPtr;
ConstCharPtr p;
// Or you can just eschew typedefs altogether for this, which many prefer:
char* p1;
const char* p2;
// Someone who really knows his stuff and is on his game would actually prefer:
char const* p3;
// And he wouldn't have made the initial mistake, because he would have written it as:
CharPtr const p4;
// And he would know that defines a const CharPtr, not a pointer to a const char,
// and he'd realize the latter is impossible to achieve with CharPtr. The advantage
// of placing the cv-qualifiers like this is apparent if you read the declarations
// aloud from right-to-left, e.g. "p3 is a pointer to const char", and
// "p4 is a const CharPtr". It allows you to legitimately replace CharPtr
// with its definition and say, "p4 is a const pointer to char".

Typedefs can be helpful with more complex pointer types, but that's not the answer to the confusion about the so-called "equivalence of arrays and pointers". The answer to that confusion begins with what I wrote earlier, "There is a standard conversion employed in most contexts called the array-to-pointer conversion that produces a pointer to an array's first element given the array's name. It is from this conversion that the pointer-like behavior of arrays arises." That's the fundamental thing. Once you get that, then it's mainly a matter of learning about pointers.

lloyddean · Jan 9, 2013

And yet I've been programming using 'C' since '82 and I far and away prefer to use user defined types than not - personal preferences.

Cromulent · Jan 9, 2013

Sydde said:
In a very real sense, any variable label can be thought of as a sort of symbolic pointer because it designates a location in memory (and its location can be obtained with the "&" operator).

Except any variable declared with the register keyword which has no memory address since it is stored in a CPU register (assuming the compiler actually decides to do what you have told it to do).

jon3543 · Jan 9, 2013

lloyddean said:
And yet I've been programming using 'C' since '82 and I far and away prefer to use user defined types than not - personal preferences.

Hmmm. That combines an "argument from authority" (claimed long experience and by extension, implied competence) with the vague "personal preferences", which could be anything from "I like pie" to "Here are my detailed reasons for doing this, but I won't bother giving them", along with a healthy dose of overgeneralization to "user defined types", when the discussion had actually drifted to the specific subject of typedefs, which are merely "synonyms for types" like I said, and it further implies I was arguing against typedefs in general, when I explicitly indicated I was not. That's a heck of a lot to unravel!

So, that's either a very well-crafted troll, or you really do think it was a meaningful comment. Either way, there's probably not much point in my responding to it in any depth. Instead, having already written a couple of messages that discussed a number of things in depth, I'll just respond to your message with one containing exactly as much content, no more, no less, "I've been eating pie since 1970. I far and away prefer it to cake - personal preference."

lloyddean · Jan 9, 2013

That wasn't my intent at all. It was to point out how silly your statement seemed!

In other words I'm not an idiot, you're not idiot and you shouldn't assume someone who does thing differently then you is an idiot either.

jon3543 · Jan 9, 2013

lloyddean said:
That wasn't my intent at all. It was to point out how silly your statement seemed!

What "statement"? You've quoted nothing, and I have no idea what you're referring to.

In other words I'm not an idiot, you're not idiot and you shouldn't assume someone who does thing differently then you is an idiot either.

Again, you're posting things that have nothing to do with what I've said. I've "assumed" nothing of the sort you seem to be claiming. If you want to have a discussion about what I've written, please quote it and reply directly to it. I'll be happy to clarify the things you clearly don't understand, but I need to know specifically what they are.

Sander · Jan 10, 2013

Cromulent said:
Except any variable declared with the register keyword which has no memory address since it is stored in a CPU register (assuming the compiler actually decides to do what you have told it to do).

It has no obligation to, as the register keyword is "just a hint" (which is actually commonly ignored). In fact, the other way around is also true: The compiler may well decide to keep something in a register when the optimizer thinks that's a good idea, even if you didn't mention the register keyword.

macOS * Confusion in C

macrumors G3

Moderator

macrumors G3

macrumors 6502a

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 604

macrumors 6502a

Moderator

macrumors 68030

macrumors G3

Moderator

macrumors 6502a

macrumors 65816

macrumors 68030

macrumors 6502a

macrumors 68030

macrumors 6502a

macrumors 65816

macrumors 604

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 6502a

Our Staff